BDC Hamburger Icon

Menu

Close
BDC Logo
Search Icon
Advertising Disclosure
Close
Advertising Disclosure

Business.com aims to help business owners make informed decisions to support and grow their companies. We research and recommend products and services suitable for various business types, investing thousands of hours each year in this process.

As a business, we need to generate revenue to sustain our content. We have financial relationships with some companies we cover, earning commissions when readers purchase from our partners or share information about their needs. These relationships do not dictate our advice and recommendations. Our editorial team independently evaluates and recommends products and services based on their research and expertise. Learn more about our process and partners here.

How to Create a Web Scraping Tool in PowerShell

Write a tool in PowerShell to gather all the data from a web page.

author image
Written by: Adam Bertram, Senior WriterUpdated Dec 04, 2024
Gretchen Grunburg,Senior Editor
Business.com earns commissions from some listed providers. Editorial Guidelines.
Table Of Contents Icon

Table of Contents

Open row

Web scraping tools are helpful for gathering data from various web pages. For example, price comparison sites that share the best deals usually grab their information from specific feeds e-tailers set up for that purpose. However, not all online sellers make price feeds available. In these instances, comparison sites can use web scraping to grab the information they need.

Because website design varies and websites have unique structures, you must create customized scrapers to extract relevant data effectively. Luckily, scripting languages like PowerShell help you build reliable web scraping tools. You can use PowerShell modules to extract the information you need.

TipBottom line
Monitor your competitors' prices by creating a web scraper and running it once daily using Windows Task Scheduler. To run your scraper as part of a web application, host it on an Internet Information Services (IIS) server and manage it with IIS application pools.

Web scraping explained

Web scraping is the process of parsing an HTML web page and gathering elements in a structured manner. Because HTML pages have specific structures, it’s possible to parse through them and retrieve semi-structured output. Note the use of the qualifier “semi.” Most pages aren’t perfectly formatted behind the scenes and may have website design mistakes, so your output may not be perfectly structured. 

Still, scripting languages like Microsoft PowerShell — along with a little ingenuity and some trial and error — can help you build reliable web scraping tools that pull information from many different web pages. 

It’s important to remember that web page structures vary widely. If even a small element is changed, your web scraping tool may no longer work. Focus on the basics first and then build more specific tools for particular web pages.

Bottom LineBottom line
Web scraping can enhance your marketplace knowledge greatly. However, you may want to consult a business lawyer about the legalities of scraping specific sites before you get started.

What makes PowerShell suitable for web scraping

Federico Trotta, a technical writer and data scientist who has authored numerous articles on web scraping and data analysis, noted that PowerShell comes pre-installed on Windows, making it an accessible and flexible tool for users. “In particular, its integration with Windows makes it easily accessible without requiring additional installations or dependencies,” Trotta explained. “Additionally, its compatibility with .NET libraries provides a layer of extensibility for more advanced needs.”

Still, Trotta cautioned that PowerShell may not be suitable for more complex projects. “When it gets more complex, use different tools or technologies,” Trotta advised. “One of the main limitations is that with PowerShell, you can only scrape static HTML content. When pages have dynamically loaded content from JavaScript, you can overcome this by using Selenium.”

How to create a web scraping tool in PowerShell

The command of choice is Invoke-WebRequest. This command should be a staple in your web scraping arsenal. It simplifies pulling down web page data and allows you to focus on parsing the data you need.

Trotta emphasized the importance of mastering this method. “To tie to PowerShell, in the beginning, I would suggest learning the methods Invoke-WebRequest and Invoke-RestMethod, as these cmdlets form the backbone of most PowerShell scraping scripts,” Trotta explained. “In particular, the Invoke-WebRequest cmdlet gets content from a web page on the internet; the Invoke-RestMethod cmdlet, instead, sends HTTP and HTTPS requests to REST web services that return richly structured data.”

With Invoke-WebRequest, let’s explore how a web scraper views a web page and extracts its content.

1. See how a web scraping tool views Google.

To get started, let’s use a simple web page everyone is familiar with — Google.com — and see how a web scraping tool views it. 

First, pass Google.com to the Uri parameter of Invoke-WebRequest and inspect the output.

$google = Invoke-WebRequest –Uri google.com

This is a representation of the entire Google.com page, all wrapped up in an object for you.

Web scraping tool for PowerShell
Did You Know?Did you know
The Invoke-WebRequest command is highly versatile. It works on FTP and HTTP sites, which gives you more choices on where to source information and data.

2. Pull information from the web page.

Now, let’s see what information you can pull from this web page. For example, say you need to find all the links on the page. To do this, you’d reference the Links property. This will enumerate the various properties of each link on the page.

Web scraping tool example

Perhaps you just want to see the URL that it links to:

PowerShell web scraping tool

How about the anchor text and the URL? Since this is just an object, it’s easy to pull information like this:

You can also see what the infamous Google.com form with the input box looks like under the hood:

PowerShell web scraping example
Did You Know?Did you know
If your scraper stops working, the website structure has likely changed. Unfortunately, you'll have to build a new web scraper.

How to download the information you’ve scraped

Let’s take this one step further and download information from a web page. For example, perhaps you want to download all images on the page. To do this, we’ll also use the –UseBasicParsing parameter. This command is faster because Invoke-WebRequest doesn’t crawl the DOM. 

1. Download images from the webpage.

For another example, here’s how to use PowerShell to enumerate all images on the CNN.com website and download them to your local computer.

$cnn = Invoke-WebRequest –Uri cnn.com –UseBasicParsing

2. Find the images’ URL hosts. 

Now let’s figure out each URL on which the image is hosted.

PowerShell web-scraping example code

3. Download the images. 

Once you have the URLs, you can use Invoke-Request again. However, this time, you’ll use the –OutFile parameter to send the response to a file.

@($cnn.Images.src).foreach({

$fileName = $_ | Split-Path -Leaf

Write-Host “Downloading image file $fileName”

Invoke-WebRequest -Uri $_ -OutFile “C:$fileName”

Write-Host ‘Image download complete’

})

PowerShell code example

In this case, you saved the images directly to my C:, but you can easily change this location to a different one. With PowerShell’s ability to manage file system ACLs, you can save images to your directory of choice.

4. Test the images from PowerShell.

If you’d like to test the images directly from PowerShell, use the Invoke-Item command to pull up the image’s associated viewer. Below, you can see that the Invoke-WebRequest pulled down an image from CNN.com with the word “bleacher.”

Bleacher example

Building your own web scraping tool is straightforward

Use the code in this article as a template to build your own tool. For example, you could build a PowerShell function called Invoke-WebScrape with a few parameters like –Url or –Links. Once you have the basics down, you can easily create a customized tool to apply in many different ways.

If you’re new to web scraping, Trotta suggests starting with more foundational skills. “Familiarity with HTML structure and basic CSS selectors is fundamental to parsing web content. Without this familiarity, you cannot scrape web pages,” Trotta explained. “Focus on small, manageable projects at first, such as extracting headlines from a news website, to build confidence. Then, scale to improve.”

Trotta suggests incorporating the following features to improve performance and reliability in PowerShell web scraping scripts:

  • Start-Sleep: This cmdlet helps avoid overwhelming target servers, which could result in IP bans or legal issues, by suspending the script’s activity for a specified time.
  • Try and catch blocks: Use these blocks to ensure your script can gracefully handle unexpected responses, such as timeouts or 404 errors. 

With these approaches, developers can create scripts that are efficient and resilient.

Mark Fairlie contributed to this article.

Did you find this content helpful?
Verified CheckThank you for your feedback!
author image
Written by: Adam Bertram, Senior Writer
Adam Bertram is an IT expert and business owner who has spent decades advising on network administration and security, designing and building infrastructure, and creating and teaching courses on Windows Server, Powershell and more. While maintaining his own IT business, he has provided hands-on DevsOps services for clients like JPMorgan Chase. At business.com, Adam covers the ins and outs of PowerShell, helping companies improve their Windows configurations and automations. Bertram, who has a degree in computer science, holds Microsoft, Cisco and CompTIA credentials. He has written numerous tutorials, guides and books, including "Building Better PowerShell Code: Applying Proven Practices One Tip at a Time."
BDC Logo

Get Weekly 5-Minute Business Advice

B. newsletter is your digest of bite-sized news, thought & brand leadership, and entertainment. All in one email.

Back to top