Structured data containing null causes misalignment

This topic contains 2 replies, has 2 voices, and was last updated by Profile photo of Michael Maher Michael Maher 6 months ago.

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • #36997
    Profile photo of Michael Maher
    Michael Maher
    Participant

    Hi Folks,

    I've was determined to work this one out without seeking help. However, after four days, during which I did learn a bit, I have to admit defeat.

    Here's the code to retrieve data.

    $teamID = '13'
    $squads = Invoke-WebRequest -Uri "http://www.uefa.com/uefaeuro/season=2016/teams/team=$teamID/squad/index.html"
    $data = ($squads.ParsedHtml.getElementsByTagName('td')| Where{$_.InnerText}).Outertext
    $results = for ($i = 0; $i -lt $data.count; $i += 6){
            [PSCustomObject]@{
                Name = $data[$i]
                Birthdate = $data[$i+1]
                Age = $data[$i+2]
                Club = $data[$i+3]
                Apps = $data[$i+4]
                Goals = $data[$i+5]}
                }   
    $results 
    

    In the sample results below you can see that a null value for the club for the player Leander Dendoncker is knocking out all subsequent pairings.

    Name : Thibaut Courtois
    Birthdate : 11/05/1992
    Age : 23
    Club : Chelsea
    Apps : 8
    Goals : 3

    Name : Nacer Chadli
    Birthdate : 02/08/1989
    Age : 26
    Club : Tottenham
    Apps : 4
    Goals : 1

    Name : Leander Dendoncker
    Birthdate : 15/04/1995
    Age : 20
    Club : –
    Apps : –
    Goals : Marouane Fellaini

    Name : 22/11/1987
    Birthdate : 28
    Age : Man. United
    Club : 8
    Apps : 4
    Goals : Eden Hazard

    Name : 07/01/1991
    Birthdate : 25
    Age : Chelsea
    Club : 9
    Apps : 5
    Goals : Adnan Januzaj

    Name : 05/02/1995
    Birthdate : 21
    Age : Man. United
    Club : 1
    Apps : –
    Goals : Vincent Kompany

    Name : 10/04/1986
    Birthdate : 29
    Age : Man. City
    Club : 7
    Apps : –
    Goals : Sven Kums

    I'd love some fresh ideas on how to tackle this.

    Thanks,

    Michael

    #36998
    Profile photo of Dave Wyatt
    Dave Wyatt
    Moderator

    Seems like it works fine if you get rid of the "| Where { $_.InnerText }" filter, but web scraping is always going to be fragile. Any cosmetic changes to their site will potentially screw up your script.

    $teamID = '13'
    $squads = Invoke-WebRequest -Uri "http://www.uefa.com/uefaeuro/season=2016/teams/team=$teamID/squad/index.html"
    $data = ($squads.ParsedHtml.getElementsByTagName('td')).Outertext
    $results = for ($i = 0; $i -lt $data.count; $i += 6){
            [PSCustomObject]@{
                Name = $data[$i]
                Birthdate = $data[$i+1]
                Age = $data[$i+2]
                Club = $data[$i+3]
                Apps = $data[$i+4]
                Goals = $data[$i+5]}
                }   
    $results 
    
    
    #37034
    Profile photo of Michael Maher
    Michael Maher
    Participant

    Thanks Dave, I understand the volatility of scraping. This is just a fun exercise with my son. I won't get fired but I'll drop down in his estimations if it breaks.

    Based on the advice you gave, I removed the whole Where clause. This yielded no results but it pointed me in the right direction. After some fiddling around I stumbled on some working code.

    $teamID = '13'
    $squads = Invoke-WebRequest -Uri "http://www.uefa.com/uefaeuro/season=2016/teams/team=$teamID/squad/index.html"
    $data = ($squads.ParsedHtml.getElementsByTagName('td') | Where{$_}).innertext
    
        $results = for ($i = 0; $i -lt $data.count; $i += 6){
            [PSCustomObject]@{
                Name = $data[$i]
                Birthdate = $data[$i+1]
                Age = $data[$i+2]
                Club = $data[$i+3]
                Apps = $data[$i+4]
                Goals = $data[$i+5]}
                }   
    $results
    
Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.