Author Posts

March 27, 2016 at 4:33 am

Hi Folks,

I've was determined to work this one out without seeking help. However, after four days, during which I did learn a bit, I have to admit defeat.

Here's the code to retrieve data.

$teamID = '13'
$squads = Invoke-WebRequest -Uri "http://www.uefa.com/uefaeuro/season=2016/teams/team=$teamID/squad/index.html"
$data = ($squads.ParsedHtml.getElementsByTagName('td')| Where{$_.InnerText}).Outertext
$results = for ($i = 0; $i -lt $data.count; $i += 6){
        [PSCustomObject]@{
            Name = $data[$i]
            Birthdate = $data[$i+1]
            Age = $data[$i+2]
            Club = $data[$i+3]
            Apps = $data[$i+4]
            Goals = $data[$i+5]}
            }   
$results 

In the sample results below you can see that a null value for the club for the player Leander Dendoncker is knocking out all subsequent pairings.

Name : Thibaut Courtois
Birthdate : 11/05/1992
Age : 23
Club : Chelsea
Apps : 8
Goals : 3

Name : Nacer Chadli
Birthdate : 02/08/1989
Age : 26
Club : Tottenham
Apps : 4
Goals : 1

Name : Leander Dendoncker
Birthdate : 15/04/1995
Age : 20
Club : –
Apps : –
Goals : Marouane Fellaini

Name : 22/11/1987
Birthdate : 28
Age : Man. United
Club : 8
Apps : 4
Goals : Eden Hazard

Name : 07/01/1991
Birthdate : 25
Age : Chelsea
Club : 9
Apps : 5
Goals : Adnan Januzaj

Name : 05/02/1995
Birthdate : 21
Age : Man. United
Club : 1
Apps : –
Goals : Vincent Kompany

Name : 10/04/1986
Birthdate : 29
Age : Man. City
Club : 7
Apps : –
Goals : Sven Kums

I'd love some fresh ideas on how to tackle this.

Thanks,

Michael

March 27, 2016 at 4:50 am

Seems like it works fine if you get rid of the "| Where { $_.InnerText }" filter, but web scraping is always going to be fragile. Any cosmetic changes to their site will potentially screw up your script.

$teamID = '13'
$squads = Invoke-WebRequest -Uri "http://www.uefa.com/uefaeuro/season=2016/teams/team=$teamID/squad/index.html"
$data = ($squads.ParsedHtml.getElementsByTagName('td')).Outertext
$results = for ($i = 0; $i -lt $data.count; $i += 6){
        [PSCustomObject]@{
            Name = $data[$i]
            Birthdate = $data[$i+1]
            Age = $data[$i+2]
            Club = $data[$i+3]
            Apps = $data[$i+4]
            Goals = $data[$i+5]}
            }   
$results 

March 28, 2016 at 2:25 pm

Thanks Dave, I understand the volatility of scraping. This is just a fun exercise with my son. I won't get fired but I'll drop down in his estimations if it breaks.

Based on the advice you gave, I removed the whole Where clause. This yielded no results but it pointed me in the right direction. After some fiddling around I stumbled on some working code.

$teamID = '13'
$squads = Invoke-WebRequest -Uri "http://www.uefa.com/uefaeuro/season=2016/teams/team=$teamID/squad/index.html"
$data = ($squads.ParsedHtml.getElementsByTagName('td') | Where{$_}).innertext

    $results = for ($i = 0; $i -lt $data.count; $i += 6){
        [PSCustomObject]@{
            Name = $data[$i]
            Birthdate = $data[$i+1]
            Age = $data[$i+2]
            Club = $data[$i+3]
            Apps = $data[$i+4]
            Goals = $data[$i+5]}
            }   
$results