Extracting data from excel and updating variables

Welcome Forums General PowerShell Q&A Extracting data from excel and updating variables

Viewing 9 reply threads
  • Author
    Posts
    • #256778
      Participant
      Topics: 1
      Replies: 3
      Points: 20
      Rank: Member

      Hi all,

      I am trying to take content from an excel document to update and append some other code, and struggling to get it to work efficiently. I want to look at the excel file, find new entries, grab 3 cells information and insert them into another PS script.

      My excel table looks like the below

      Subscription Name Category 1 Category 2
      Sub1 A X
      Sub2 B Y
      Sub3 C Z

      And my code looks like below…

      Subscription Category 1 =
      If(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub1″),”A”
      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub2″),”B”
      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub3″),”C”

      Subscription Category 2 =
      If(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub1″),”X”
      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub2″),”Y”
      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub3″),”Z”

      This information is then used in PowerBI as a Measure, so don’t question the format in terms of PowerShell (it works 🙂 )

      What I am looking to do, is run a PS script to check the excel file for new lines, take the 3 pieces of information and append them to the end of the code in the lines above!

      Thanks in advance!

    • #256781
      Senior Moderator
      Topics: 9
      Replies: 1309
      Points: 4,782
      Helping Hand
      Rank: Community Hero

      Not really clear on the content you shared above. CSV is fine , but the other Subscription Category 1 and 2, What are those. Where and How in the code are you putting them ?

      Do you have any code which can be shared here ? IT should make anyone helpful to understand your issue clearly.

    • #256808
      Participant
      Topics: 1
      Replies: 3
      Points: 20
      Rank: Member

      Hi again,

      Sorry  – I know it is a bit vague.

      Basically, I currently have the below in a PS1 file…

      Subscription Category 1 =
      If(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub1″),”A”
      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub2″),”B”
      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub3″),”C”

      Subscription Category 2 =
      If(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub1″),”X”
      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub2″),”Y”
      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub3″),”Z”

      So lets assume the Excel file gets updated, so a new line is added and now looks like the following

       

      Subscription Name Category 1 Category 2
      Sub1 A X
      Sub2 B Y
      Sub3 C Z
      Sub4 D W

      I want to be able to run a new PS script to look at the table in excel and add the following new lines

      Subscription Category 1 =
      If(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub1″),”A”
      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub2″),”B”
      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub3″),”C”

      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub4″),”D”

      Subscription Category 2 =
      If(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub1″),”X”
      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub2″),”Y”
      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub3″),”Z”

      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub4″),”W”

      • This reply was modified 5 days, 15 hours ago by Black3y03.
    • #256835
      Participant
      Topics: 13
      Replies: 1753
      Points: 3,136
      Helping Hand
      Rank: Community Hero

      The code you are referencing is not Powershell. It appears to be Excel expression language which is related to VBA. There is no Powershell posted that I can see thus far. Typically, to detect new lines you want have a date or a copy of the data from the last run to compare against the current data. Where is the data coming from originally? If Excel can be taken out of the equation, that would be a much better solution.

    • #256844
      Participant
      Topics: 1
      Replies: 3
      Points: 20
      Rank: Member

      ***

      Hi,

      I am aware that the output is not Powershell, I just store it in a PS1 file as it is easier to manage than a .txt file. I don’t know where to start in PowerShell to get it to look at the Excel file (which has to be Excel, as thats how the users currently record the data) and populate the resulting output files, be them PS1 or txt.

      I am able to add a new field into the Excel table that records date and time of entry, and if I were to set the PowerShell script to run on a weekly basis I assume I could set it to only look at rows where the date/time was in the last week?

      The powershell below was taken from somewhere else but I am not having any luck modifying it…

      • This reply was modified 5 days, 13 hours ago by Black3y03.
      • This reply was modified 5 days, 12 hours ago by grokkit. Reason: code formatting
    • #256865
      Participant
      Topics: 13
      Replies: 1753
      Points: 3,136
      Helping Hand
      Rank: Community Hero

      There is a module that may accelerate things:

      Anything you are doing in Excel is still using an old Excel COM object, Powershell is just executing the old COM code. For instance, if I wanted to get data from the last row in a certain column, I would run a macro and search for how to do that in vba:

      https://www.thespreadsheetguru.com/blog/2014/7/7/5-different-ways-to-find-the-last-row-or-last-column-using-vba

      Then you have to convert that VBA to Powershell, so I would search for something like ‘Powershell find xlCellTypeLastCell’

      • This reply was modified 5 days, 11 hours ago by grokkit. Reason: whitespace cleanup
    • #256874
      Senior Moderator
      Topics: 10
      Replies: 161
      Points: 871
      Helping Hand
      Rank: Major Contributor

      Before you can write a script that solves your problem, you need to have a clear understanding of what you have to start with, what you need to end with, and a process that will move you from the first to the second. In order to get help writing this script, you need to be able to convey those things to other people in a clear and concise way. So, let me try to break this down:

      1. You have an Excel file that is generated by an external company process over which you have no control. The Excel file contains data that you need to extract.
      2. You need an output file that contains specific data from the Excel file, and you want to update the output file regularly as new information is added to it.
      3. You do not have a clear idea of a process to get from (1) to (2).

      Let’s take a closer look at these.
      In regard to (1), an Excel file (Workbook) contains pages called Worksheets (Sheets) which contain cells with information in them. Excel files also have formatting and frequently empty columns and rows that separate sections for the sake of human readability. This presents a problem when dealing with the file programmatically, because you cannot predict (with an algorithm) which columns/rows will be empty when trying to extract data. When dealing with spreadsheets from a scripting perspective, it’s common to convert them to CSV files, which effectively strips all the formatting and extra fancy features of an Excel file and just gives you the values stored in individual cells, separated with commas and line breaks (columns and rows). The CSV can be read as a simple TXT file, or easily imported into PowerShell and manipulated as a data object.

      In regard to (2), the purpose of your output file is unclear. Are you generating a record of changes? Are you extracting a specific set of data from a spreadsheet that contains more information than you need? Does the result need to be human readable? Are you creating a file that will later be used as input for another process? Does the result need to fit a particular format so that it can be used in another program?

      If you just need a specific section of the Excel file, converting that particular Worksheet to a CSV might suit your needs and is fairly simple to accomplish in PowerShell. Rather than run a comparison against a previous version of the file (to discover new entries), you could simply overwrite the existing file with a new file whenever the script is executed. Is the date of addition of new information important? Does the output need to be a historical record of changes?
      And on that note, what should happen in the output file if information is deleted from the Excel file? Do you also need the script to discover cells that have been removed? That will be more complicated.

      In order to create the process (3) you will need to answer most if not all of these questions, and then figure out a logical step-by-step method to get from (1) to (2). Once you’ve done that you can write a PowerShell script that represents that process. Without it, you’re stumbling around blind.

    • #256949
      Participant
      Topics: 1
      Replies: 3
      Points: 20
      Rank: Member

      Many thanks to you both for your replies… I fear I may have confused the situation somewhat so I shall seek to clarify the position I am in, in the way it has been articulated but Grokkit above.

      1. I have a bunch of fields in an excel document that is updated externally
        1. I am able to add fields if required, such as a date or time of entry of a new row
        2. I am able to convert to CSV as part of this process, if required
        3. The table looks similar to below, with more and more entries being added by users
      Subscription Name Subscription Category 1 Subscription Category 2 Entered
      Sub1 A X 1/4/20 09:00
      Sub2 B Y 1/4/20 09:00
      Sub3 C Z 1/4/20 09:00
      Sub4 D W 1/4/20 09:00

      2. I have external code that I use in another application (PowerBI) that looks like the lines below. I have a means to automate this process on a schedule, so there is no need for the PowerShell to control or manage when its run.

      Subscription Category 1 =
      If(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub1″),”A”
      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub2″),”B”
      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub3″),”C”

      Subscription Category 2 =
      If(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub1″),”X”
      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub2″),”Y”
      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub3″),”Z”

      This can be stored in any file type, such as .txt if easiest. I simply used PowerShell ISE when doing this manually as it made if easier for me to format and idefity the fields I was manually updating/adding

      3. As per my example in the table above, a user has entered a new line “Sub4”. I would like to create a PowerShell script that opens this spreadsheet (potentially converting to CSV first?) and updates the PowerBI code in an output file. In the example above, this would result in the output file now looking like below, with the bold values identifying what has been pulled from the CSV.

      Subscription Category 1 =
      If(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub1″),”A”
      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub2″),”B”
      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub3″),”C”

      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub4″),”D

      Subscription Category 2 =
      If(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub1″),”X”
      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub2″),”Y”
      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub3″),”Z”

      ,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription'[Subscription Name],”Sub4″),”W

      Does that make sense now?

    • #257039
      Senior Moderator
      Topics: 10
      Replies: 161
      Points: 871
      Helping Hand
      Rank: Major Contributor

      Yes, that clarifies a few things.

      Because you don’t care about the date or time in the output file, I suggest not bothering with tracking that information at all and not attempting to do any comparison between new and old versions of the input file. Read the contents of the input file each time, process them into the format you need, and then write the processed content to your output file. As a bonus, this will also account for any information deleted from the input file without you having to write specific code to handle it.

      You can get the data out of the Excel file directly, and without installing extra PS modules, although using a module that’s built for it may be easier. This article will lead you through finding data in an Excel file and building an object with it: Finding Data in Excel Using PowerShell. Once you can do that reliably, it should be relatively easy to build the output lines from the collected data.

    • #257060
      Participant
      Topics: 13
      Replies: 1753
      Points: 3,136
      Helping Hand
      Rank: Community Hero

      Let’s start with something like this. Excel is the devil. It uses an old COM object and is painful. Take something like this:

      http://mekalikot.blogspot.com/2014/08/read-and-get-values-from-excel-file.html

      To extract information you have to emulate what you would do the in gui to select the cells and the loop through them to get data. Powershell uses PSObject, so you would have to loop through it to generate a PSObject to then export to a CSV to keep a copy for comparison. A bit ugly. Powershell can import a CSV directly into a PSObject and then you get all of the power of Powershell to do comparisons, filters, etc. on the content. If it were me, I would just try to grab the sheet I’m looking for and save it as a CSV like this:

      http://blog.powercram.com/2015/05/use-powershell-to-save-excel-worksheet.html

      This code create a CSV from the XLSX, if you have a previous csv to perform deltas then it will show you what changed. Otherwise, you and comparing nothing, but Compare-Object is used to output just for consistent output:

      The first run you have no previous data to perform deltas, so it would just be like this:

      When I ran this, it generated the previous file. The next time I ran it, I removed a row and added a new row to the XLSX:

      The side indicators show what in the previous (on the left) and the current file (on the right) based on ReferenceObject and DiffrenceObject respectively. If you run it again, then you would see that it’s all == because we overwrote the previous file. You could keep dated copies of the previous files for reference if you wanted to see changes. You could email the changes only to the subscriptions. Now that it’s a PSObject, you have the full flexibility of Powershell.

Viewing 9 reply threads
  • You must be logged in to reply to this topic.