Building a custom web scraper in Apify for real estate listings

Looking for real estate data that’s actually useful—and not stuck behind some clunky export button or paywall? You’re not alone. If you want to grab listings direct from the source, you’ll need a web scraper that can handle all the weirdness real estate sites throw at you. This guide is for folks who are ready to roll up their sleeves, not just click “download CSV.”

We’ll walk through building a custom scraper using Apify, a cloud-based platform that does a lot of the grunt work for you. You’ll get the unvarnished truth about what’s easy, what’s annoying, and what to watch out for. By the end, you’ll be able to pull real estate listings into a spreadsheet or database—no magic, just code and common sense.


Why Use Apify for Real Estate Scraping?

Before diving in, let’s get this out of the way: you can write your own Python scripts and run them on your laptop. But if you want something that:

  • Runs in the cloud (no messing with proxies or headless browsers on your machine)
  • Handles scaling, failures, and scheduling
  • Lets you share or even sell your scrapers

…then Apify is worth a look. It’s not free, but it’s way less of a headache than rolling your own infrastructure. If you need something quick and disposable, stick with a simple script. For anything you want to run more than once, Apify is usually the pragmatic choice.

What You’ll Need

  • An Apify account (the free tier is fine to start)
  • Basic JavaScript skills (don’t worry, you won’t be writing React)
  • The URL of a real estate site you want to scrape (pick one that doesn’t block bots like it’s their only job)
  • A clear idea of what data you want (price, address, description, images, etc.)

Pro tip: Scraping isn’t magic. If the data isn’t visible on the page or is loaded after you scroll, you’ll need to handle that (more on this later).


Step 1: Pick Your Target and Get Permission

First, choose a real estate website. Ideally, pick one that:

  • Lists data you actually need (don’t scrape just because you can)
  • Doesn’t have aggressive anti-bot measures
  • Is legal and ethical to scrape (read their terms—most sites aren’t fans, but you’re less likely to get in trouble if you scrape slowly and don’t republish the data)

If the site has an API or a “download data” button, use that instead. Scraping should be your last resort, not your first option.


Step 2: Explore the Data (a.k.a. Inspect the Page)

Open the site in Chrome (or Firefox), right-click on a listing, and click “Inspect.” Look for:

  • How are listings structured? (Table, grid, list)
  • What HTML tags hold the data you want? (<div class="price">, <span class="address">, etc.)
  • Is the data right there, or loaded later with JavaScript?
  • Do images or links point to full URLs, or just partial ones?

You’ll use these selectors in your scraper. If the listings only load after you scroll, or if you see a bunch of blank <div>s, the site uses JavaScript to build the page. You’ll need a headless browser—good news, Apify’s default tools can handle this.


Step 3: Create an Apify Actor

Apify calls its individual scrapers “Actors.” Think of them as reusable little programs that run in the cloud.

  1. Log in to Apify and click “New Actor.”
  2. Choose the “Web scraper” or “Puppeteer” template (Puppeteer is for sites that need a real browser).
  3. Name your actor—something like real-estate-scraper.

You’ll land in an online code editor. Don’t panic.


Step 4: Write (or Tweak) the Scraper Code

Here’s a skeleton using Puppeteer (Apify’s flavor):

javascript const { Actor } = require('apify');

Actor.main(async () => { const input = await Actor.getInput(); const { startUrls } = input;

const requestQueue = await Actor.openRequestQueue();

for (const { url } of startUrls) {
    await requestQueue.addRequest({ url });
}

const crawler = new Apify.PuppeteerCrawler({
    requestQueue,
    handlePageFunction: async ({ page, request }) => {
        // Wait for listings to load
        await page.waitForSelector('.listing'); // Change to real selector

        const data = await page.evaluate(() => {
            const items = [];
            document.querySelectorAll('.listing').forEach(listing => {
                items.push({
                    price: listing.querySelector('.price')?.innerText || '',
                    address: listing.querySelector('.address')?.innerText || '',
                    // Add more fields as needed
                });
            });
            return items;
        });

        for (const item of data) {
            await Actor.pushData(item);
        }
    },
    maxRequestsPerCrawl: 50,
    // More options: proxies, timeouts, etc.
});

await crawler.run();

});

What to change:

  • Replace .listing, .price, .address with real selectors from your target site.
  • Tweak maxRequestsPerCrawl if you want more or fewer pages.
  • If the site paginates, you’ll need to add logic to enqueue next pages.

Honest take: Most “templates” only get you halfway. Every site is different. You’ll almost always need to poke around and adjust selectors.


Step 5: Handle Pagination

If you only scrape the first page, you’ll miss most listings. Pagination is usually a “Next” button or a set of numbered links.

Two common patterns:

  • Standard “Next” button:
    Find the selector for the next page, grab its URL, and add it to your request queue.
  • URL-based pages:
    URLs change like example.com/listings?page=2. Just increment the number and enqueue those URLs.

Sample code to enqueue next page:

javascript handlePageFunction: async ({ page, request, enqueueLinks }) => { // ...extract your data...

// Find and enqueue "next" page
const nextPage = await page.$('a.next');
if (nextPage) {
    await enqueueLinks({
        selector: 'a.next', // Replace with the real selector
        baseUrl: request.loadedUrl,
    });
}

}

Don’t bother: Scraping infinite scroll pages is a pain. If you can, use “page=2” style URLs instead. If you must scroll, Apify lets you use page.evaluate(() => window.scrollBy(0, 1000)) in a loop, but it’s flaky.


Step 6: Run and Debug Your Scraper

Click “Run” in Apify’s editor. Watch the logs:

  • If you see timeouts, try increasing wait times or check your selectors.
  • If your data is all blank, your selectors probably don’t match.
  • If you get blocked, try adding proxies (Apify has built-in proxy pools).

Pro tip: Always test on a small batch of URLs before going big. Sites can change layout or block you mid-run.


Step 7: Export and Use Your Data

Your results show up in the “Dataset” tab. You can:

  • Download as CSV, Excel, or JSON
  • Hook up an API to pull data directly
  • Schedule the Actor to run every day/week/etc.

Don’t overcomplicate it: If all you need is a spreadsheet, just grab the CSV. Save the fancy database integrations for later.


Step 8: Stay Out of Trouble

A few things people ignore (and regret later):

  • Don’t hammer the site. Set delays between requests (minConcurrency: 1, maxConcurrency: 2) to avoid getting blocked.
  • Don’t scrape personal data. Pulling names, emails, or phone numbers is riskier (and sometimes illegal).
  • Check copyright. Don’t republish scraped listings as your own. Use the data for analysis, not cloning the site.

What Works, What Doesn’t, What to Ignore

  • Works well: Grabbing public listing info, prices, addresses, and links—especially from sites with “old-school” HTML.
  • Gets tricky: Sites with lots of JavaScript, infinite scrolling, or aggressive bot protection.
  • Ignore the hype: “No-code” scrapers promise a lot, but for real estate sites, you’ll almost always need to tweak code. Don’t waste hours trying to avoid it.

Wrapping Up

Building a custom web scraper in Apify isn’t rocket science, but expect to do some trial and error. Start small: get one page working, then handle pagination, and only then think about scaling up. Don’t chase “perfect”—get something working, see what breaks, and fix as you go. Most of all, keep it simple. The less clever your scraper is, the longer it’ll keep working.