This topic contains 7 replies, has 3 voices, and was last updated by
January 1, 2019 at 7:11 am #132177
I am trying to dowload a csv file from a webpage. The csv file is masked in a java script url and i am having difficulty downloading the contents. I get the entire page contents instead of the data in the csv. Could someone provide some pointers on how to fix it.
$url = "https://www.mcxindia.com/market-data/bhavcopy" Invoke-WebRequest $url -SessionVariable session -UseBasicParsing $addUserSite = Invoke-WebRequest $url -WebSession $session $addUserForm = $addUserSite.Forms $addUserForm.Fields["__EVENTTARGET"] = 'ctl00$cph_InnerContainerRight$C001$lnkExpToCSV' $addUserForm.Fields["__EVENTARGUMENT"] = '' $filename = "C:\temp\Data.txt" Invoke-WebRequest -uri $url -method post -Body $addUserForm.Fields -WebSession $session -useBasicPrasing -outfile $fileName}
January 1, 2019 at 9:53 am #132179ParticipantPoints: 30Rank: Member
I have never used this kind of method for downloading files , but while trying to download large files on regular intervals for a project , i came across a similar article. that might be a little help for you.
January 2, 2019 at 1:50 am #132230ParticipantPoints: 487Rank: Contributor
mridul7arya68, your provided link is the one the NR is already referencing.
NR, there are just sites, where they code things, to not allow such automation, for whatever reason they chose. Meaning, they only allow reaching such things via real human interaction, and prevent bots and the like from hammering their site / resource, etc.
Now, I am not saying this is the case for the site you are hitting, but I've run into this more and more. Sure, it's irritating / disappointing, but the site owners / devs can do what they want regardless of how it impedes anything I am trying to do via automation to make my life or corporate process easier.
What you are doing, of course does get the raw file, but not like the interactive human action of clicking the link.
The data you are after is there in that CDATA block. So, you'll have to parse all that.
Yet, all the headers in the download are dynamically grabbed from embedded table from the page. So, you'd have to manually deal with that as well. At least from what I can see when I step into the page code end to end. Also, those field values you are trying to use have no values in them at all, so, nothing to hit. This file gets generated by those hidden VIEWSTATE* items.
January 2, 2019 at 2:59 am #132236
Thank you postanote for taking your time to post the reasoning and the detailed explanation. I will now look to use GUI tools to download the file. The only reason why i was trying to avoid GUI automation is because my thoughts are it will not be able to run as a background process and would require the application to be logged in and the GUI available.
January 2, 2019 at 5:20 am #132255ParticipantPoints: 30Rank: Member
@NR I have no idea of GUI automation , kindly post how you figure out the problem , it would be a great way for me to look at it.
January 2, 2019 at 10:12 am #132275
@Mridul , postanote has mentioned about a few automation tools you can use to go about the GUI automation. He has the provided the links to a wordpress to get a quick intro and start developing a few scripts. It is a very good intro to GUI automation using powershell.
The idea here is to use these modules for eg in this case to mimic human action by clicking on a link and downloading a file.
January 2, 2019 at 4:41 am #132248ParticipantPoints: 487Rank: Contributor
Yeppers, I get that, but there is a good deal happening behind the seems, which you cannot control with what you are doing. That file is not static, it's being built dynamically calling a function of that interactive click.
All the stuff you are after are embedded, in parent and child Divs, which are just a pain to deal with.
January 2, 2019 at 10:06 am #132269
Thanks postanote. I agree pursuing it via a powershell script is going to be tedious.
You must be logged in to reply to this topic.