Case Studies / Phone Number Audit Automation
Workflow Automation CallRail Conversion Tracking Site Audit

Thousands of Pages. Every Phone Number. Done by Lunch.

Large websites pick up stale phone numbers the same way old houses pick up bad wiring — slowly, from different people working on different parts at different times. We built an automation that crawls every page of a site, extracts every phone number it finds, runs in safe batches so it never stalls permanently, and produces a clean audit report in the time it takes to eat lunch.

1–2 hrs

Typical runtime for most large websites

Any Size

Handles sites with tens of thousands of pages

Zero Loss

Resumes from last batch on error — no restart needed

The Problem

Big Websites Have a Phone Number Problem — and Nobody Knows It Until They Check.

Many of the clients we work with have large websites — hundreds or thousands of pages built up over years, often by multiple teams or agencies working in parallel. One team handles the main service pages. Another manages the blog and SEO content. A third handled a site migration two years ago. Someone else rebuilt the footer last year.

Each of these teams works with the phone number that's current at the time. But when a business switches call tracking providers, changes from a static number to a CallRail dynamic number, or simply updates a phone number for a new region — the update rarely reaches every page. Old phone numbers linger in page footers, in blog post bodies, in landing pages that haven't been touched in three years, in sidebar widgets that nobody remembers creating.

The result is a site where some pages show the correct tracked number and others show a static number that bypasses call tracking entirely. Leads that come in through those pages are invisible to the ad platforms. Cost-per-call figures are understated. Campaign performance looks worse than it is, or better in the wrong places. And nobody knows, because nobody has checked every page.

Doing it manually — loading pages one by one, looking at the source or visible number — is not realistic above a few dozen pages. On a site with thousands of URLs, it simply doesn't get done. Which means the problem compounds quietly until someone decides to investigate.

"Different marketing teams working on the same site over years is the most common reason we find stale phone numbers. Nobody deleted the old ones — they just never made it into the update."

The Challenge

At Scale, a Simple Crawler Stalls. You Need Batching and Recovery.

The first version of the tool worked fine on smaller sites. Feed it a domain, it crawls every page it can reach, extracts phone numbers, done. On a site with fifty pages, that's straightforward.

On a site with ten or twenty thousand pages, a single continuous crawl runs into problems. Memory builds up. Connections time out. A network hiccup midway through means the whole run has to start from scratch. If you've been crawling for forty minutes and the job fails at page eight thousand, going back to page one is not acceptable.

The solution was to restructure the automation into batches — crawling a fixed number of pages at a time, writing progress to a checkpoint file after each batch, and checking that checkpoint at the start of every run. If the tool stops for any reason, the next run reads the checkpoint and picks up exactly where the previous run ended. No duplicated work, no lost progress, no starting over.

This pattern applies to any large-scale automation

Any automation that needs to process a large list — URLs, records, files, API responses — should be built with batching and a checkpoint mechanism from the start. Assuming a long-running job will complete without interruption is optimistic. Designing for recovery is the more reliable default.

How It Works

Give It a Domain. Come Back in an Hour. Here's What Happened in Between.

Automation Flow

1. Input: domain URL

The tool is given the root domain; it discovers all crawlable URLs itself via sitemap and link-following

2. Checkpoint check

Reads the progress file — if a previous run was interrupted, skips all already-processed URLs and resumes from the next batch

3. Batched page crawl

Fetches pages in fixed batches; after each batch completes, writes progress to checkpoint before moving to the next

4. Phone number extraction

Parses each page's HTML for phone number patterns — including formatted variants (parentheses, dashes, spaces, country codes)

5. Output: audit report

Each row: page URL + every phone number found on it — ready to sort, filter, and cross-reference against the expected CallRail numbers

1

URL Discovery — Finding Every Page First

Before crawling for numbers, the tool builds a complete list of URLs to check. It starts from the sitemap if one is available, then supplements with link-following from crawled pages to catch any URLs that aren't in the sitemap. On large sites with years of content, the sitemap alone is often incomplete — orphaned pages, old blog categories, archived landing pages that were never formally removed.

2

Batching and Checkpointing — Surviving the Long Run

The full URL list is divided into batches of a fixed size. After each batch is processed, the tool writes the current position to a checkpoint file — a simple record of how far through the list it has gotten. When the tool starts, the first thing it does is check for an existing checkpoint. If one exists, it skips every URL before that point and begins from the next unprocessed batch.

This means a site with twenty thousand pages can be audited across multiple runs if needed — each run picks up exactly where the previous one ended. In practice, most sites complete in a single run of one to two hours. But the recovery mechanism is there, and it removes the anxiety of running a long automation and wondering what happens if something goes wrong.

3

Extraction and Output — What You Get at the End

Each page is fetched and parsed for phone number patterns — common formats including local numbers, numbers with country codes, and formatted variants with parentheses, dashes, or spaces. The output is a flat report: one row per page URL, with every phone number found on that page in adjacent columns.

Once you have the report, sorting and filtering takes minutes. Sort by phone number to see all the pages still showing an old static number. Filter to the expected CallRail tracking number to confirm which pages are correctly set up. Any page showing a number that isn't in the tracked set is a gap in your call attribution — and now you know exactly which pages need fixing.

The Result

Hours of Manual Work Reduced to a Process That Runs While You're at Lunch.

Before this tool, auditing phone numbers on a large site meant either spending hours clicking through pages manually, or not doing it at all and hoping everything was fine. Neither option was good. Manual review doesn't scale past a few dozen pages without becoming a day's work, and skipping the audit means untracked numbers stay untracked indefinitely.

With the automation running, the audit takes as long as it takes to crawl the site — typically one to two hours for most client websites. During that time, no one is doing anything. The tool runs, the checkpoint keeps progress safe, and at the end there's a complete picture of every phone number on every page of the site.

The direct benefit is conversion tracking accuracy. Every page showing a static phone number instead of the CallRail number is a lead that won't be attributed to the campaign that drove it. Finding and fixing those pages closes gaps in attribution that would otherwise silently understate campaign performance — or, worse, never get found at all.

The tool is used routinely at the start of any new client engagement that involves call tracking, and whenever a site goes through a significant update or migration. It's one of those things that sounds simple because the problem it solves is simple — but the hours it replaces add up quickly.

Full Coverage

Every page checked — including orphaned pages not in the sitemap

No Babysitting

Start it and leave — batch recovery means errors don't restart the job

Closed Loop

Attribution gaps found and fixed before they compound over months

Got a Large Site With Call Tracking You're Not 100% Sure About?

If your site has grown over years through multiple teams or agencies, the odds are good that at least some pages are showing the wrong number. Running this audit is usually the first thing we do when a new client has call tracking set up. Get in touch and we can run it against your domain.

Brendan Andrew Chase

Written by

Brendan Andrew Chase

Conversion tracking specialist and workflow automation consultant with 10+ years auditing and fixing attribution setups for service businesses and agencies across the US, UK, and EU. 200+ projects delivered. Founder of Extra Large Marketing Digital, based in Rio de Janeiro.