Powershell help for extracting specific links from a list of websites.

This topic contains 3 replies, has 2 voices, and was last updated by  Tanjamuse 2 months, 2 weeks ago.

  • Author
  • #95220



    I'm hoping someone can help me change a string of code from working with one site to another.

    this is what I have working for one site:

    $InputLinksFile = "c:\temp\InputLinks.txt"
    $OutputLinksFile = "C:\temp\OutputLinks.txt"
    $InputLinks = @()

    $BasePage = "https://www.fanfiction.net/tv/Buffy-The-Vampire-Slayer/?&srt=2&lan=1&r=10&p="
    [int]$FirstPageNumber = "600"
    [int]$LastPageNumber = "601"
    $CurrentPageNumber = $FirstPageNumber

    # Make a list of all the pages we want to input, counting from FirstPageNumber to LastPageNumber
    while ($CurrentPageNumber -le $LastPageNumber) {
    $InputLinks += "$BasePage$CurrentPageNumber"

    # If you want to manually input a list of pages instead, remove # in front of the next line:
    $InputLinks = Get-Content -Path $InputLinksFile

    ForEach ($InputLink in $InputLinks) {
    # Fetch the entire page. Get links in page with ().Links. Page is compressed with gzip, so we'll have to account for that
    $InputPageLinks = (Invoke-WebRequest -Uri $InputLink -Headers @{"Accept-Encoding"="gzip"}).Links
    # Filter the link list to only contain links with the sequence "/1/" in it.
    $FilteredOutputLinks = $InputPageLinks | Where-Object {$_.href -like "*/1/*"}
    # The provided links are relative and not absolute, so we need to add the domain name to the output
    foreach ($OutputLink in $FilteredOutputLinks) {
    $FinalLink = "https://fanfiction.net$($Outputlink.href)"
    Out-File -Append -FilePath $OutputLinksFile -InputObject $FinalLink
    Clear-Variable InputPageLinks

    Example link from the new site: https://archiveofourown.org/tags/Buffy%20the%20Vampire%20Slayer/works
    And this is the type of links that need to be extracted: https://archiveofourown.org/works/13345065

    I'm hoping someone can help me.


  • #95233

    Collin Chaffin

    You would need to post an example link here without it it's impossible to help. It looks like you may have tried and the forum filtered it out so perhaps try again and use the code insert.

  • #95235

  • #95310


    I've now edited so the two links in the first post are visible.

You must be logged in to reply to this topic.