Case Studies / AI Agent — RSA Testing

AI Agent Google Ads API RSA Testing LLM Quality Score

Humans Test RSAs Twice a Year.
Our Agent Does It Every Week.

Responsive search ad headline testing is one of the highest-leverage tasks in a Google Ads account. It's also one of the most consistently neglected — because it's time-consuming, repetitive, and easy to deprioritise. We built an AI agent that handles it automatically: pulling live ad data via the Google Ads API, auditing keyword coverage, generating headline and description variants from the landing page and an LLM, and running rolling A/B tests on a weekly and monthly cadence.

10%+ CTR

Up from ~8% across managed accounts

Quality Score ↑

Indirect reduction in CPC and CPA

Weekly

Reviews on high-impression ad groups

The Problem

Everyone Knows RSA Testing Matters. Almost Nobody Does It Consistently.

Responsive search ads give Google up to 15 headlines and 4 descriptions to mix and match. In theory, this creates a built-in testing surface — Google is constantly rotating combinations and surfacing the ones that perform. In practice, the quality of what Google can produce is entirely limited by the quality of what you put in. If your 15 headlines are mediocre, the best combination of mediocre headlines is still mediocre.

The task of systematically improving those headlines and descriptions — identifying which are underperforming, writing better challengers, implementing them, and reviewing results — is exactly the kind of work that gets squeezed out of a busy account manager's week. It doesn't have a clear deadline. The impact is real but gradual. And doing it properly, across multiple ad groups in multiple campaigns, takes longer than it looks.

The result across most accounts managed by humans is the same: ad copy gets set up at campaign launch, maybe revised once or twice in the first few months when the account is fresh, and then left largely unchanged for the rest of the year. The account continues to run. The headlines continue to age.

We built an agent to solve this problem permanently — not just for one client, but as a deployable system that runs across any Google Ads account we manage.

"An account manager might update RSAs once or twice in a six-month period — not because they don't know it matters, but because the task is genuinely tedious and there's always something more urgent. The agent has no such competing priorities."

— Brendan Andrew Chase, Extra Large Marketing Digital

The Rules

The Agent Operates Within a Strict Set of Constraints — by Design.

Automated ad testing without guardrails is a way to introduce chaos into an account very quickly. The agent doesn't just generate and push variants freely — it operates within a set of rules that reflect how RSA testing should actually be done.

Impressions threshold before testing

The agent only makes changes to ad groups that have accumulated enough impressions to generate statistically meaningful data. Ad groups below the threshold are reviewed weekly but left unchanged until the data is there.

CTR is the metric — not conversion rate

The ad's job is to earn the click from someone searching the right query. The landing page's job is to convert that click. Optimising RSAs against conversion rate conflates the two. The agent tests and judges on CTR; landing page optimisation is handled separately.

Top 2 keywords must own 2 headlines

The agent queries the Keyword Planner for every keyword in the ad group, identifies the top two by search volume, and ensures those keywords each appear in at least one headline. Keyword relevance in headlines is one of the strongest drivers of Quality Score and ad rank.

Weekly review, monthly implementation

The agent checks performance data weekly to stay on top of trends. Actual headline changes are implemented on a monthly cadence — frequent enough to compound improvements, infrequent enough to let each test accumulate a meaningful data window before being replaced.

How It Works

Four Inputs, One Agent, Continuous Improvement

The agent pulls from four sources to build each round of test variants — live ad performance data, keyword volume data, the client's own website copy, and an LLM for additional creative generation. Each source contributes something different to the final output.

Agent Architecture

Google Ads API

Pulls all campaigns, ad groups, RSAs, and asset-level CTR data via developer token

Keyword Planner API

Ranks keywords in each ad group by search volume; top 2 are locked into headlines

Landing page scraper

Extracts H1, H2, and H3 text from the destination URL — real copy, real brand voice

LLM generation

Generates additional headline and description variants within character limits and RSA policy rules

Rules engine & variant builder

Enforces keyword coverage, character limits, and duplication rules; selects the lowest-performing existing headline(s) to replace; assembles the final test variant set

Weekly review

Reads performance data; flags ad groups accumulating enough impressions for a decision

Monthly update via API

Pushes new headline and description variants to the account; retires the losers

Google Ads API — Pulling the Full Picture

The agent connects to Google Ads using the developer API and a standard OAuth token — the same access method available to any developer with a Google Ads account. It pulls all campaigns, ad groups, and RSAs, along with asset-level performance data: how often each individual headline and description was shown, and what CTR each combination generated. This gives a full view of what's working and what isn't before a single change is made.

Keyword Planner — Locking in the Top Two

For every ad group that qualifies for testing, the agent queries the Keyword Planner with the full keyword list from that group and retrieves search volume estimates. The top two keywords by volume are then checked against the current RSA headlines. If they're not present, the agent includes them in the next round of variants — ensuring that the highest-volume terms the campaign is bidding on are always represented in the ad copy itself.

This keyword-to-headline alignment is one of the most direct levers for Quality Score. When a user's search term matches text that appears in the ad headline, Google considers the ad more relevant — which raises the Quality Score, improves ad rank, and over time lowers what you pay per click.

Landing Page Scraper — Using the Client's Own Words

One of the most reliable sources of headline material is the landing page the ad points to. The agent scrapes the destination URL for each ad group and extracts the H1, H2, and H3 tag text. These headings reflect how the client describes their own service in their own voice — they're already written with the offer in mind, they're already on-brand, and they tend to carry the kind of specific, concrete language that performs better in ads than generic alternatives.

There's also a relevance benefit: when the language in the ad matches the language on the landing page, the transition from click to landing feels consistent, which supports both Quality Score and user experience.

LLM Generation — Filling the Remaining Slots

The remaining headline and description slots are filled by an LLM, given the keyword list, the landing page content, the existing ad copy, and the character limit constraints as context. The LLM generates variants that go beyond straight keyword insertion — testing different angles, benefit statements, urgency cues, and calls to action within Google's RSA policies.

The point is not to hand creative control entirely to the model — the keyword-coverage rule and the page-scraping ensure that the fundamentals are always grounded. The LLM handles the creative surface area that a human would otherwise either spend time on or skip entirely.

An Important Distinction

The Ad's Job Is the Click. The Landing Page's Job Is the Conversion.

One of the most common mistakes in RSA testing is optimising for conversion rate rather than click-through rate. The logic seems sound — you want conversions, so optimise toward them. In practice, this approach introduces confounding variables and often leads to worse decisions.

The ad doesn't control what happens after the click. The landing page does. A visitor who clicks an ad and doesn't convert didn't necessarily click the wrong ad — they may have hit a slow page, an unclear offer, a form that asked for too much. Evaluating the ad on the outcome of something it has no control over produces unreliable signal.

What the ad does control is whether the right person — someone searching for what you offer — decides the result looks worth clicking. That's CTR. The agent tests and optimises on that basis. Landing page conversion rate is a separate optimisation track, addressed through A/B testing the page itself.

The exception is when an ad makes an unusually specific promise — "free initial consultation" or "same day service" — that the landing page either fulfils or doesn't. In those cases, the ad copy is a direct input to conversion. But outside of that kind of specificity, CTR is the right metric for RSA testing, and that's what the agent uses.

Why this matters for Quality Score too

Google's Quality Score is partly based on expected CTR — its prediction of how likely your ad is to be clicked given the search query. Improving actual CTR through better headlines signals to Google that the ad is more relevant, which feeds back into Quality Score. Higher Quality Score means better ad rank at the same bid, or the same ad rank at a lower bid. The CTR improvements from RSA testing compound over time through this mechanism.

The Result

CTRs Consistently Above 10%. Quality Scores Up. Testing That Never Stops.

Across accounts running the agent, click-through rates have moved from around 8% to consistently above 10%. That's a meaningful shift — it means more traffic from the same number of impressions, which effectively reduces cost per click without changing bids, and provides the ad platforms with a stronger relevance signal.

Quality Scores have also risen across ad groups where the keyword coverage rule has aligned top-volume keywords with headlines. The indirect effect on cost per click and cost per acquisition is real but harder to isolate, since landing page tests and bid strategy changes are happening in parallel. What can be said with confidence is that the direction of movement is consistent and the primary driver is the headline-to-keyword alignment the agent enforces.

The compounding effect is what makes this approach valuable over time. A human reviewing RSAs once every few months produces occasional improvements with long gaps between them. The agent produces consistent incremental improvements every month, with a weekly review layer that catches anything unusual early. Over a 12-month period, the difference in test volume between the two approaches is substantial.

This agent is now deployed across multiple client accounts in different industries. The core architecture — API pull, keyword planner check, page scraping, LLM generation, rules engine — is the same for each. The prompt context and keyword coverage thresholds are adjusted per account. Setup is typically completed within a single engagement.

8% → 10%+

Typical CTR improvement across managed accounts

Monthly

Cadence of headline changes vs. 1–2× per year without the agent

Any Account

Deployable across industries — same agent, adjusted context

Want This Running on Your Google Ads Account?

If your RSAs haven't been properly tested in the last few months — or ever — this is one of the highest-leverage improvements you can make to an account without touching bids or budgets. Get in touch with details on your account and we'll explain how the agent would be set up for your specific campaigns.

Set this up for my account See our AI agent services

Humans Test RSAs Twice a Year. Our Agent Does It Every Week.