Outputting object from helper function

This topic contains 2 replies, has 3 voices, and was last updated by  Sam Boutros 1 month, 1 week ago.

  • Author
    Posts
  • #88994

    John Steele
    Participant

    I've started on a script to find any Personal Identifiable Information (PII) in a set list of file types. I'm currently just testing .docx files. The object output from the Find-PIIWord helper function isn't outputting as I expect it would. As shown at the very bottom the object outputs all at once at the very end after all the verbose and warning output and not each time it's called.

    Should I be returning the object from Find-PIIWord back to the main function and outputting the object from there?
    Is a function call from a switch statement really the right approach here?
    Any other critiques would be greatly appreciated.

    Function Find-PII {
    
        [cmdletbinding()]
    
        Param (
            
            [Parameter(Mandatory = $true,
                       ValueFromPipeline = $true,
                       ValueFromPipelineByPropertyName = $true)]
            [Alias("FilePath")]
            [string[]] $Path = $PWD
        )
    
        Begin {
    
            #Converts relative path to absolute path
            $Path = Convert-Path $Path
    
            #Has 9 digits, may be split as xxx-xx-xxxx by dashes or spaces
            $patternSocial = '(\d{3}[-| ]\d{2}[-| ]\d{4})|(\d{9})'
    
            #Starts with a 4 and have 16 digits, may be split as xxxx-xxxx-xxxx-xxxx by dashes or spaces
            $patternVisa = '(4\d{3}[-| ]\d{4}[-| ]\d{4}[-| ]\d{4})|(4\d{15})'
    
            #Starts with 51-55 and have 16 digits, may be split as xxxx-xxxx-xxxx-xxxx by dashes or spaces
            $patternMC = '(5[1-5]\d{2}[-| ]\d{4}[-| ]\d{4}[-| ]\d{4})|(5[1-5]\d{14})'
    
            #Starts with 34 or 37 and have 15 digits, may be split as xxxx-xxxxxx-xxxxx by dashes or spaces
            $patternAMEX = '(3[47]\d{2}[-| ]\d{6}[-| ]\d{5})|(3[47]\d{13})'
    
            #Start with 6011 or 65 and have 16 digits, may be split as xxxx-xxxx-xxxx-xxxx by dashes or spaces
            $patternDiscover = '(6(?:011|5\d{2})[-| ]\d{4}[-| ]\d{4}[-| ]\d{4})|(6(?:011|5\d{2})\d{12})'
    
            New-PIITempFolder
            $PIITemp = Get-Item -Path "$env:TEMP\FindPII"
        }
    
        Process {
            
            $files = Get-ChildItem -Path $Path -Include '*.docx' -Recurse
            #$files = Get-ChildItem -Path $Path -Include '*.docx', '*.xlsx', '*.pdf', '*.pptx', '*.txt' -Recurse
    
            foreach ($file in $files) {
            
                switch ($file.Extension) {
                
                    .docx {Find-PIIWord -InputObject $file}
                    #.xlsx {Find-PIIExcel}
                    #.pptx {Find-PIIPowerPoint}
                    #.pdf {Find-PIIPdf}
                    #.txt {Find-PIITxt}
                    #default {break}
                }
            }
        }
    
        End {
    
            #Remove-PIITempFolder
        }
    }
    
    Function Find-PIIWord {
        
        param (
            
            [Parameter(ValueFromPipeline = $true)]
            [System.IO.FileInfo] $InputObject
        )
    
        Write-Verbose "Looking for PII in $($InputObject.Name)"
    
        $docxTemp = "$PIITemp\$($InputObject.Name)"
    
        New-Item -Path "$PIITemp\docx" -ItemType Directory -Force | Out-Null
        Copy-Item -Path $InputObject.FullName -Destination "$docxTemp.zip" -Force
        Expand-Archive -Path "$docxTemp.zip" -DestinationPath "$PIITemp\docx\" -Force | Out-Null
    
        [xml] $docx = Get-Content -Path "$PIITemp\docx\word\document.xml"
        $PIIFound = $docx.document.body.p.r.t | Select-String -Pattern $patternSocial, $patternVisa -Quiet
    
        if ($PIIFound) {
    
            $obj = [pscustomobject] @{
                
                'Name' = $InputObject.Name;
                'Length' = $InputObject.Length;
                'LastWriteTime' = $InputObject.LastWriteTime;
                'FullName' = $InputObject.FullName
            }
    
            Write-Warning "PII found in $($InputObject.name)"
            Write-Output $obj
        }
    
        #Remove-Item -Path "$PIITemp\docx" -Recurse -Force
    }
    
    Function New-PIITempFolder {
        
        if (-not (Test-Path -Path "$env:TEMP\FindPII")) {
        
            New-Item -Path $env:TEMP -Name FindPII -ItemType Directory | Out-Null
        }
    }
    
    Function Remove-PIITempFolder {
        
        if (Test-Path -Path "$env:TEMP\FindPII") {
        
            Remove-Item -Path "$env:TEMP\FindPII" -Recurse -Force
        }
    }
    

    The output of my test is below:

    PS G:\Microsoft\Powershell> Find-PII -Path . -Verbose
    VERBOSE: Looking for PII in Resume.docx
    WARNING: PII found in Resume.docx
    
    VERBOSE: Looking for PII in test.docx
    WARNING: PII found in test.docx
    Name        Length LastWriteTime          FullName                           
    ----        ------ -------------          --------                           
    Resume.docx  27689 12/12/2017 12:22:58 AM G:\Microsoft\Powershell\Resume.docx
    test.docx    11970 12/12/2017 12:26:21 AM G:\Microsoft\Powershell\test.docx
    
  • #89005

    postanote
    Participant

    QQ. What is your rationale for doing this manually?
    Other than as a learning effort, or offline forensic effort.

    Don't get me wrong, there is nothing wrong with doing this as MS has specific articles on the topic...

    Security Watch Where Is My PII?
    'technet.microsoft.com/en-us/library/2008.04.securitywatch.aspx'

    but...the enterprise approach to doing this would be to use Windows Server FSRM/FCI deployment. This deployment, will scan your storage resources for whatever string you find prudent, and take action on it, move it, protect it with RMS policies, etc....

    FSRM and FCI: Frequently Asked Questions
    'technet.microsoft.com/en-us/library/ee344836(v=ws.10).aspx'

    FCI: CLASSIFIED
    'technet.microsoft.com/en-us/library/ee681552.aspx'

    Classifying files based on location and content using the File Classification Infrastructure (FCI) in Windows Server 2008 R2
    'blogs.technet.microsoft.com/filecab/2009/05/11/classifying-files-based-on-location-and-content-using-the-file-classification-infrastructure-fci-in-windows-server-2008-r2'

    Automating the doc protection using FCI integration with RMS bulk protection tool
    'blogs.technet.microsoft.com/amolrb/2010/02/09/automating-the-doc-protection-using-fci-integration-with-rms-bulk-protection-tool'

    Protect everything: using FCI to protect files of any type with Windows Server 2012
    'cloudblogs.microsoft.com/enterprisemobility/2012/11/09/protect-everything-using-fci-to-protect-files-of-any-type-with-windows-server-2012'

    You can even user AIP (basically RMS and FCI in the cloud) to do this without any additional server needs.
    'cloudblogs.microsoft.com/enterprisemobility/2016/06/22/announcing-azure-information-protection'

    Even consumers can do this with the free Azure AIP service.
    RMS for individuals and Azure Information Protection
    'docs.microsoft.com/en-us/information-protection/understand-explore/rms-for-individuals'

    all of the above allow you to scan data for content strings and do what you will from there.

    As for you question..
    'Should I be returning the object from Find-PIIWord back to the main function and outputting the object from there?'

    Are you saying you are not getting the results you'd expect, or that you just want to push it out a different way?

    There is always more than one way to do X or Y.

    The question is, does it work for you and your organization as are?

    Is it easily understood to those whom you'd pass this on to?
    Is it defined to be as extensible as possible, if needed of course.
    Is it easily maintainable as is?
    etc...

    There can be more elegant ways to do things, but that too can be very based on opinion, personal stance, habits, beliefs, etc.

  • #89090

    Sam Boutros
    Participant

You must be logged in to reply to this topic.