Recruiters and HR folks: ever feel like you're flying blind when it comes to what other companies are offering? You’re not alone. Most job boards won’t let you export listings, and copy-pasting a hundred roles by hand isn’t anyone’s idea of a good time. If you want a smarter way—one where you can pull down current job listings, salaries, and requirements for analysis—web scraping is your friend. This guide is for HR professionals who want to scrape job data (ethically and legally!) with zero fluff and minimal frustration.
We’ll walk through every step using Scrapingbee, a dead-simple web scraping API that handles a lot of the annoying parts for you. No need to be a programmer, but you will need to run a few Python scripts. If you can use Excel and follow instructions, you’re good.
Let’s get you set up.
Step 1: Know the Rules Before You Scrape
First, a quick word of caution. Web scraping isn’t illegal, but scraping the wrong way can get you blocked or worse. Here’s what matters:
- Check the site’s Terms of Service. Most public job boards allow scraping for personal/internal use, but some (like LinkedIn) are stricter.
- Don’t hammer the site. Aggressive scraping can slow down or disrupt sites. That’s bad form.
- Stick to public data. Never try to grab data behind logins or paywalls.
- Use scraped data internally. Don’t republish or resell what you scrape.
Pro tip: If you’re scraping your own job listings (say, from your company’s ATS), you’re in the clear. For third-party sites, stay polite—set your script to go slow and don’t scrape thousands of jobs at once.
Step 2: Set Up Your Tools
You’ll need:
- A Scrapingbee account (free trial is fine)
- Python installed (version 3.7 or newer)
- A text editor (Notepad works, but VS Code is nicer)
- Basic comfort running commands in a terminal
If you don’t have Python, download it here. To check if it’s installed, run:
sh python --version
You should see something like Python 3.10.8
.
Install Required Python Packages
Open your terminal and run:
sh pip install requests pandas
requests
lets you make web requests easilypandas
is for organizing scraped data into a spreadsheet
You’re set on the tech side.
Step 3: Sign Up for Scrapingbee and Get Your API Key
Go to Scrapingbee’s signup page and create an account. Once you’re in, you’ll find your API key on the dashboard. Copy it somewhere safe—you’ll need it for your script.
Why Scrapingbee? Most job boards use JavaScript or block bots. Scrapingbee solves both problems by acting like a real browser and rotating your IP behind the scenes. You don’t have to mess with proxies, CAPTCHAs, or headless browsers.
Step 4: Pick a Job Board and Inspect the Listings
Choose a site—something public, like Indeed, Glassdoor, or your local job board. For this example, we’ll use a made-up public job board at https://example.com/jobs
(swap this for your real target later).
Open the job listings page in Chrome. Right-click a job title and select “Inspect.” You’ll see the HTML structure. Look for patterns:
- Is each job in a
<div class="job-listing">
? - Do they use
<h2>
for the job title? - Where are salary and location listed?
Don’t get bogged down. You just need to spot the repeating structure so you can tell your script what to grab.
Step 5: Write a Simple Scraping Script
Here’s a basic script to fetch and parse job listings. You’ll need to adjust the selectors (the find
and find_all
lines) to match your actual target site.
python import requests from bs4 import BeautifulSoup import pandas as pd
API_KEY = 'YOUR_SCRAPINGBEE_API_KEY' TARGET_URL = 'https://example.com/jobs'
def scrape_jobs(): params = { 'api_key': API_KEY, 'url': TARGET_URL, 'render_js': 'true' } response = requests.get('https://app.scrapingbee.com/api/v1/', params=params) soup = BeautifulSoup(response.text, 'html.parser')
jobs = []
for job_elem in soup.find_all('div', class_='job-listing'):
title = job_elem.find('h2').get_text(strip=True) if job_elem.find('h2') else ''
location = job_elem.find('span', class_='location').get_text(strip=True) if job_elem.find('span', class_='location') else ''
salary = job_elem.find('span', class_='salary').get_text(strip=True) if job_elem.find('span', class_='salary') else ''
jobs.append({'Title': title, 'Location': location, 'Salary': salary})
df = pd.DataFrame(jobs)
df.to_csv('job_listings.csv', index=False)
print(f"Scraped {len(jobs)} jobs. Saved to job_listings.csv")
if name == 'main': scrape_jobs()
What’s happening here: - The script tells Scrapingbee to fetch the page and run JavaScript (so you get the real listings, not just a blank page). - It parses the HTML to extract job info. - It saves the results as a CSV for easy Excel analysis.
Replace YOUR_SCRAPINGBEE_API_KEY
and TARGET_URL
with your info.
Pro tip: If you see “No jobs found” or just a single row, the site’s structure probably changed. Re-inspect the page and update the selectors.
Step 6: Handle Pagination (Multiple Pages)
Most job boards paginate listings. To scrape more than just the first page, you’ll need to loop through pages.
Usually, page URLs look like:
https://example.com/jobs?page=1 https://example.com/jobs?page=2
Tweak your script like this:
python for page in range(1, 6): # Scrape first 5 pages page_url = f'https://example.com/jobs?page={page}' params['url'] = page_url response = requests.get('https://app.scrapingbee.com/api/v1/', params=params) # (rest of the parsing code here)
Don’t go overboard. Five pages is plenty for most HR analysis. Remember the polite scraping rules.
Step 7: Clean and Analyze Your Data
Open job_listings.csv
in Excel or Google Sheets. You can now:
- Sort by salary to see who’s paying what
- Filter by job title (e.g., “Software Engineer”)
- Spot trends in requirements and benefits
If you know a bit of Python, pandas lets you slice and dice this data even more.
Reality check: Scraped data is always a bit messy. Titles might be inconsistent (“Sr. Dev” vs. “Senior Developer”), and some salaries will be missing or “competitive.” That’s normal—don’t expect perfection.
Step 8: Automate (Optional, but Handy)
Want to check job boards weekly? Set your script to run on a schedule using Windows Task Scheduler or cron (on Mac/Linux).
- Windows: Google “Task Scheduler run Python script.”
- Mac/Linux: Use
crontab -e
and add your schedule.
Set it to email you the CSV, or just overwrite the old one.
What to Ignore (for Now)
- CAPTCHA-busting tricks: Scrapingbee usually gets past these out of the box. If you hit a wall, try a different site.
- AI-powered scraping tools: Most are overkill for this job and can be pricey.
- Scraping behind logins: Seriously, don’t do this unless you’re scraping your own data.
Gotchas and Honest Limits
- Sites change layouts. Your script will break sooner or later. Check the selectors if you stop getting results.
- You won’t get 100% clean data. Even big companies struggle here.
- Scraping isn’t real-time. You’re getting a snapshot, not a live feed.
But for most HR needs—salary benchmarking, competitor analysis, or keeping tabs on market trends—a simple scrape every month or so is all you need.
Keep It Simple, Iterate Often
You don’t need a PhD in data science to get value from web scraping. The hardest part is just trying it the first time. Start with one site, get a rough CSV, and see what you can learn. Want more? Add more sites or fields. Don’t let perfect be the enemy of done.
If you’re stuck, retrace your steps, double-check the selectors, and remember: nobody gets this right on the first try. The data is out there—now you know how to grab it.