Author Posts

July 24, 2017 at 4:59 pm

Hello,
I am new to PowerShell and am playing with basic data operations and have a question about creating a lagged variable within a datatable. I know how to do this in R, or using SQL ... but what would be a good native PowerShell approach when dealing with datatables?

Below is a small example data set and the desired output below that. Note: it is lagged by group (ID).

Please let me know if you have any ideas. Thank you!

$table = New-Object system.Data.DataTable
$col1 = New-Object system.Data.DataColumn ID,([string])
$col2 = New-Object system.Data.DataColumn EFFDT,([datetime])
$col3 = New-Object system.Data.DataColumn Var,([string])
$table.columns.add($col1)
$table.columns.add($col2)
$table.columns.add($col3)
$row = $table.NewRow()
$table.Rows.Add("A", "1/1/2017", "1")
$table.Rows.Add("A", "1/2/2017", "1")
$table.Rows.Add("A", "1/3/2017", "2")
$table.Rows.Add("B", "1/1/2017", "2")
$table.Rows.Add("B", "1/2/2017", "3")
$table.Rows.Add("B", "1/3/2017", "3")
$table.Rows.Add("B", "1/5/2017", "4")
$table | format-table -AutoSize

# desired output
# ID EFFDT Var Lag1Var
# — —– — ——-
# A 1/1/2017 12:00:00 AM 1 1
# A 1/2/2017 12:00:00 AM 1 2
# A 1/3/2017 12:00:00 AM 2 –
# B 1/1/2017 12:00:00 AM 2 3
# B 1/2/2017 12:00:00 AM 3 3
# B 1/3/2017 12:00:00 AM 3 4
# B 1/5/2017 12:00:00 AM 4 –

July 24, 2017 at 5:05 pm

I'm not a data expert – can you tell me what a "lagged" field is?

July 24, 2017 at 5:17 pm

Sure. I essentially mean to grab the value from the previous row in a time-series table.

In the example below, there are IDs A and B. Each row has a date, followed by a field that is an integer in this example. The lag is simply the value from the previous row for each ID and is shown in bold in the last column below.

ID, Date, Var, LaggedVar
A, 1/1/2017, 1, 1
A, 1/2/2017, 1, 2
A, 1/3/2017, 2, –
B, 1/1/2017, 2, 3
B, 1/2/2017, 3, 3
B, 1/3/2017, 3, 4
B, 1/5/2017, 4, –

July 24, 2017 at 5:21 pm

So, PowerShell as a data manipulation language isn't so much a thing. Data tables in PowerShell, for example, aren't "in" PowerShell as much as they're "in" .NET Framework; PowerShell doesn't have anything native that does this. So you're obviously in .NET, but if it has some specific feature for that task, I'm afraid I don't know it (StackOverflow might be a good spot to ask). If I had to do what you're doing, I'd just save the relevant field in a variable each time through the loop, and then use that variable in setting the next row. Variables in PowerShell aren't scoped to loops, so they'll persist through each iteration.