Author Posts

June 10, 2018 at 3:21 pm

Hi All,

We have requirement to split 1 gb txt file in multiple

each files should split to 300 mb
1. 1st file 300 mb
2. 2nd file 300 mb
3. 3rd file 300 mb
4 4th file 100 mb

Below is the script we have used to split files into multipe each 300 mb, Script is spliting file , but some files are spliting in 250 mb and some files 300mb

Can anyone help me with above requirement to split in same size for first 3 files.

Script

#split test
$sw = new-object System.Diagnostics.Stopwatch
$sw.Start()
$filename = "E:\So\bkp.txt"
$rootName = "E:\Ta\"
$ext = ".txt"

$linesperFile = 300000#3000k
$filecount = 1
$reader = $null
try{
$reader = [io.file]::OpenText($filename)
try{
"Creating file number $filecount"
$writer = [io.file]::CreateText("{0}{1}.{2}" -f ($rootName,$filecount.ToString("000"),$ext))
$filecount++
$linecount = 0

while($reader.EndOfStream -ne $true) {
"Reading $linesperFile"
while( ($linecount -lt $linesperFile) -and ($reader.EndOfStream -ne $true)){
$writer.WriteLine($reader.ReadLine());
$linecount++
}

if($reader.EndOfStream -ne $true) {
"Closing file"
$writer.Dispose();

"Creating file number $filecount"
$writer = [io.file]::CreateText("{0}{1}.{2}" -f ($rootName,$filecount.ToString("000"),$ext))
$filecount++
$linecount = 0
}
}
} finally {
$writer.Dispose();
}
} finally {
$reader.Dispose();
}
$sw.Stop();

Write-Host "Split complete in " $sw.Elapsed.TotalSeconds "seconds"

thanks in advance

June 14, 2018 at 4:17 am

Since you want to split the files on size I believe you should have the code look at the bytes copied rather than the text line count copied. Here is an example, you can save it to a file (eg: Out-FileChunks.ps1) and then do

. .\Out-FileChunks.ps1 -Path "SourceFile" -OutputPath "C:\SomeOutputFolder" -ChunkSizeBytes 300MB

[CmdletBinding()]
PARAM 
(
    [Parameter(Mandatory=$true, ValueFromPipeline=$true)]
    [string] $Path,

    [Parameter(Mandatory=$true)]
    [string] $OutputPath,

    [Parameter()]
    [int] $ChunkSizeBytes = 300MB
)

$bufferSize = 24 * 1024;
$buffer = [System.Byte[]]::CreateInstance([System.Byte], $bufferSize)

# Create the output folder if it doesn't exist.
if ( -not (Test-Path $OutputPath))
{
    $null = New-Item -Path $OutputPath -ItemType Directory -Force
}

$fileExtension = [System.IO.Path]::GetExtension($Path)
$fileNameRoot = $Path | Split-Path -Leaf
$outputFileNameRoot = [System.IO.Path]::GetFileNameWithoutExtension($fileNameRoot)

try
{
    $inputStream = [System.IO.File]::OpenRead($Path)
    $fileCount = 0;

    # Loop through the entire file.
    while ($inputStream.Position -lt $inputStream.Length)
    {
        $outputFileName = [string]::Format("{0}{1}{2}", $outputFileNameRoot, $fileCount.ToString("000"), $fileExtension)
        $outputFilePath = Join-Path $OutputPath $outputFileName

        Write-Progress "Writing file chunk $outputFilePath"
        # Create ouptut files up to the splitSize.
        $outputStream = [System.IO.File]::Create($outputFilePath)
            
        $chunkBytesRemaining = $ChunkSizeBytes

        while ($chunkBytesRemaining -gt 0)
        {
            $bytesRead = $inputStream.Read($buffer, 0, [System.Math]::Min($chunkBytesRemaining, $bufferSize))

            if ( $bytesRead -le 0 )
            {
                # nothing left to read so done writing all chunks.
                break;
            }

            $outputStream.Write($buffer, 0, $bytesRead);
            $chunkBytesRemaining -= $bytesRead;
        }
            
        $outputStream.Dispose();
        $outputStream = $null;

        Write-Progress "Completed writing file chunk $outputFilePath"
        $fileCount++;
    }
}
finally
{
    if ( $inputStream -ne $null )
    {
        $inputStream.Dispose();
        $inputStream = $null;
    }

    if ( $outputStream -ne $null )
    {
        $outputStream.Dispose();
        $outputStream = $null;
    }
}

June 16, 2018 at 9:02 am

You can split text files into smaller multiple text file using command line:
split -l 5000 -d –additional-suffix=.txt $FileName file
-l 5000: split file into files of 5,000 lines each.
-d: numerical suffix. This will make the suffix go from 00 to 99 by default instead of aa to zz.
–additional-suffix: lets you specify the suffix, here the extension
$FileName: name of the file to be split.
file: prefix to add to the resulting files.
As always, check out man split for more details.

For Mac, the default version of split is apparently dumbed down. You can install the GNU version using the following command.
brew install coreutils
and then you can run the above command by replacing split with gsplit. Check out man gsplit for details.