Need a list of business contacts—emails, phone numbers, addresses—from public directories? You’re not alone. Whether you’re building a lead list, enriching a CRM, or just want to automate a mind-numbing copy-paste job, scraping directories can save you hours.
But scraping isn’t as easy as it sounds. Sites have anti-bot measures. Data formats are all over the place. And doing this legally and reliably is a minefield. Tools like Scrapingbee promise to handle the grunt work (proxies, captchas, headless browsers), but there are still some things you need to get right.
This guide cuts the fluff. Here’s how to actually extract business contact info from directories—step by step—with Scrapingbee and Python. No empty promises, just what works (and what doesn’t).
1. Know Your Use Case—and the Risks
Before you start, get clear on why you’re scraping and which directories you need. Not every use case is above-board. Even public info can have legal strings attached (terms of service, GDPR, CAN-SPAM, etc.).
Ask yourself: - Are you scraping for internal analysis, B2B sales, or something else? - Is the data really public, or just publicly viewable? - Do you actually need 10,000 contacts, or will 100 hand-picked leads do?
Pro tip: Always check a site’s robots.txt and terms of service. Some sites will block or even sue you for scraping. If you’re in doubt, get legal advice. I know, it’s a hassle, but it’s cheaper than a lawsuit.
2. Pick a Directory (and Inspect the Data)
Not all directories are created equal. Some are easy to scrape, others are locked down or messy as hell. Decide where you’ll get your data. Think Yelp, Yellow Pages, local chamber directories, or association lists.
What to look for: - Consistent layouts: If every business has a different HTML, your life gets harder. - No login required: Logins mean more work and more risk. - Visible contact info: If emails and phones are hidden behind forms or images, you’ll need extra tricks.
How to check: 1. Open a business listing in your browser. 2. Right-click and “Inspect” (or hit F12) to see the HTML. 3. Look for the contact info—are emails and phones in plain text? Embedded in weird JavaScript? Behind a captcha?
If it’s a mess, consider a different source. There’s no shame in picking low-hanging fruit.
3. Set Up Your Scraping Environment
You’ll need Python (3.7+), some basic libraries, and a Scrapingbee API key. Don’t have Python? Install it from python.org.
Install what you need: bash pip install requests scrapingbee beautifulsoup4
requests
is for API calls.scrapingbee
handles the browser stuff (proxies, rendering JS).beautifulsoup4
for parsing HTML.
Get your Scrapingbee API key: Sign up at their site, grab your key from the dashboard.
4. Write a Simple Scraper (Just One Page)
Don’t jump into scraping 10,000 pages. Start small. Target a single business listing, and see if you can extract the fields you need.
Example: Scraping business contact info from a single page
python import requests from bs4 import BeautifulSoup
API_KEY = "YOUR_SCRAPINGBEE_API_KEY" TARGET_URL = "https://example-directory.com/business/12345"
response = requests.get( "https://app.scrapingbee.com/api/v1/", params={ "api_key": API_KEY, "url": TARGET_URL, "render_js": "false" # switch to true if site is JS-heavy } )
soup = BeautifulSoup(response.content, "html.parser")
Adjust selectors to match the site's HTML
name = soup.select_one("h1.business-name").get_text(strip=True) phone = soup.select_one("a.phone-link").get_text(strip=True) email = soup.select_one("a.email-link").get_text(strip=True) address = soup.select_one("div.address").get_text(strip=True)
print({ "name": name, "phone": phone, "email": email, "address": address })
What to watch for:
- The selectors (h1.business-name
, etc.) must match the site’s HTML. Inspect and adjust as needed.
- Some directories only show emails/phones after you click or solve a captcha. Scrapingbee can render JS if you need ("render_js": "true"
), but it’s slower and pricier.
Don’t try to be clever yet. Just get it working for one page.
5. Scale Up: Handle Lists and Pagination
Once you can extract info from one business, move up to scraping multiple listings from a search results or category page.
Steps: 1. Scrape the search results page to get links to individual business listings. 2. For each link, scrape the business page as above.
Sample code for step 1: (Getting business URLs from a directory page) python results_url = "https://example-directory.com/search?city=Springfield"
response = requests.get( "https://app.scrapingbee.com/api/v1/", params={ "api_key": API_KEY, "url": results_url, "render_js": "false" } ) soup = BeautifulSoup(response.content, "html.parser") links = [a['href'] for a in soup.select("a.business-link")]
Make full URLs if needed
full_links = ["https://example-directory.com" + link for link in links]
Handling pagination:
If the directory has multiple pages, look for a “Next” button or page numbers in the HTML. Collect the URLs for each results page, and repeat.
Be realistic:
- Many directories hate bots. If you hammer them with requests, they’ll block you—API or not. Scrapingbee helps, but it’s not magic.
- Add time.sleep() between requests to be polite (and avoid bans).
- Stick to hundreds, not thousands, of requests per hour.
6. Clean and Validate Your Data
Raw scraped data is always messy. Expect weird whitespace, missing fields, or junk characters. And don’t trust that every email or phone is real.
Tips: - Strip whitespace and weird symbols. - Use regex or libraries to validate emails/phones. - Deduplicate entries. - Save to CSV or JSON as you go—don’t wait until the end.
Example cleaning function: python import re
def clean_phone(phone): digits = re.sub(r"\D", "", phone) return digits
def is_valid_email(email): return re.match(r"[^@]+@[^@]+.[^@]+", email) is not None
Don’t obsess over perfection. You’ll never get 100% clean data from scraping. Good enough is fine for most use cases.
7. Stay Under the Radar (and Respect Boundaries)
Scraping isn’t illegal, but it’s a gray area. Scrapingbee makes it harder for sites to detect you, but you still need to use common sense:
- Don’t hammer the same site with thousands of requests in minutes.
- Rotate user agents and add random delays (Scrapingbee can help).
- Respect robots.txt and terms of service.
- If a site blocks you, take the hint. Move on, or try again later.
- Never try to break through logins or captchas if it feels sketchy.
Bottom line: Scraping is a tool, not a right. Be a good web citizen.
8. Troubleshooting: What to Do When Things Break
Even with Scrapingbee, stuff goes wrong. Here’s what to check when your script stops working:
- Selectors changed: The site updated its HTML. Re-inspect and fix your code.
- Blocked by anti-bot: Try
"render_js": "true"
, change user agents, or slow down requests. - Data missing: Sometimes info is loaded via JavaScript. Scrapingbee can render JS, but it costs more credits and is slower.
- Captcha or login wall: Unless you have permission or a workaround, it’s usually not worth fighting.
When in doubt: Take a break, re-check the site manually, or ask Scrapingbee support. Sometimes “just try again tomorrow” really works.
9. Export and Use Your Data
Once you’ve got your cleaned list, save it to a CSV or Excel file. Pandas makes this easy:
python import pandas as pd
data = [ {"name": name, "phone": phone, "email": email, "address": address}, # ... more entries ] df = pd.DataFrame(data) df.to_csv("business_contacts.csv", index=False)
Don’t forget: Use your data responsibly. Cold emailing scraped emails is a great way to get marked as spam, or worse.
Wrap-Up: Keep It Simple, Iterate Often
Scraping directories for business contacts can save you time, but it’s rarely as simple as “run script, get leads.” Start small, test your approach on a handful of pages, and expect some trial and error. Scrapingbee handles the heavy lifting, but you’ll still need to tweak things as sites change.
Don’t get stuck chasing perfection. Get a script working, see if the data is actually useful, and improve from there. Most importantly: don’t be a jerk about how you use scraped data. The best scrapers are the ones nobody notices.