Ads Measurement
Reach & frequency, last-touch attribution windows, CTV quartile completion — the 1P/3P measurement vocabulary.
TL;DR
This is the section for the Senior AE on the 1P & 3P Ads Measurement team. The SQL shape is mechanical — the senior signal is naming the vocabulary unprompted: attribution window, frequency cap, MRC viewability, IVT, last-touch vs MTA, 1P/3P reconciliation.
Three challenges in the pack drill the three highest-frequency patterns: reach & frequency distribution, quartile completion (CTV / VCR), and last-touch attribution with a time window. Read the architecture map and vocabulary below first — they're the lexicon you'll be expected to use unprompted in R2 (data modeling) and R3 (system design), the two rounds where this loop is usually decided.
The skeleton
Three slots: define the exposure set, define the conversion set, join them on identity + a time predicate. The conversion can be a play, a viewable play, or a quartile reach — same shape, different filter.
-- Pattern: ads-measurement attribution / completion / reach
WITH exposures AS (
SELECT event_id AS imp_id, member_id, << ENTITY_ID >>, event_ts AS impression_ts
FROM playback_events
WHERE event_name = 'impression'
),
conversions AS (
SELECT event_id AS conv_id, member_id, << ENTITY_ID >>, event_ts AS conv_ts,
watch_seconds
FROM playback_events
WHERE event_name = 'play_start'
-- Optional: AND watch_seconds >= 0.75 * runtime_seconds -- VCR-style filter
)
SELECT
c.<< ENTITY_ID >>,
COUNT(DISTINCT c.conv_id) AS conversions,
COUNT(DISTINCT e.imp_id) AS impressions,
COUNT(DISTINCT e.member_id) AS unique_reach,
-- Attribution rate: share of conversions with an eligible prior impression
AVG(CASE WHEN EXISTS (
SELECT 1 FROM exposures e2
WHERE e2.member_id = c.member_id
AND e2.<< ENTITY_ID >> = c.<< ENTITY_ID >>
AND e2.impression_ts BETWEEN c.conv_ts - INTERVAL << WINDOW >> AND c.conv_ts
) THEN 1.0 ELSE 0.0 END) AS attribution_rate
FROM conversions c
LEFT JOIN exposures e
ON e.member_id = c.member_id
AND e.<< ENTITY_ID >> = c.<< ENTITY_ID >>
GROUP BY c.<< ENTITY_ID >>
ORDER BY << ENTITY_ID >>;The trap: Fan-out from naïve INNER JOIN on (member, entity)
A single play with 3 eligible impressions becomes 3 rows in a naïve INNER JOIN. Your COUNT(*) triples and the attribution rate goes above 1.0.
Use EXISTS(semi-join — filters the left side, doesn't multiply it), or pre-aggregate the impressions to (member, entity) before joining. The same trap appears in every ads-measurement role; getting it right unprompted is the seniority bar.
Wrong
-- ❌ Fan-out: COUNT(*) is now plays × eligible impressions
SELECT c.title_id,
COUNT(*) AS plays_attributed
FROM plays c
JOIN impressions i
ON i.member_id = c.member_id
AND i.title_id = c.title_id
AND i.impression_ts BETWEEN c.play_ts - INTERVAL 30 MINUTE
AND c.play_ts
GROUP BY c.title_id;Right
-- ✅ EXISTS: each play counted at most once
SELECT c.title_id,
COUNT(*) FILTER (WHERE EXISTS (
SELECT 1 FROM impressions i
WHERE i.member_id = c.member_id
AND i.title_id = c.title_id
AND i.impression_ts BETWEEN c.play_ts - INTERVAL 30 MINUTE
AND c.play_ts
)) AS plays_attributed
FROM plays c
GROUP BY c.title_id;Vocabulary the interviewer expects unprompted
14 termsThe role title is "1P & 3P Ads Measurement Analytics." The biggest difference between a candidate from outside ads and a candidate from inside is vocabulary fluency. Read this once, then read it again before R2/R3 — these are the words you'll use to frame answers.
| Term | Definition | When it matters |
|---|---|---|
| Impression | An ad-pod exposure event. Server-side validated, deduped client-side retries. | The base unit. Everything else is a ratio over impressions. |
| Reach | Count of unique members exposed to ≥1 impression. | Not additive across campaigns or slices without HLL sketches. |
| Frequency | Impressions per reached member; campaign-flight or rolling 30d. | Bimodal in practice. Avg hides the over-exposed tail. |
| Frequency cap | Policy limit: ≤K impressions per (member, window). Enforced server-side. | Netflix's Ads Event Publisher feeds cap state to the ad server in near-real-time. |
| Attribution window | Time after an impression during which a conversion counts. 30 min view-through, 24h–7d click-through is the industry default. | Boundary semantics matter: ≤ or <? Click priority over view? Cross-device? |
| Last-touch attribution | Credit the most-recent eligible impression. The industry default. | Auditable but biased. MTA weights multiple touches; ghost-ads tests measure incrementality. |
| View-through vs click-through | VT = an impression that wasn't clicked but preceded a conversion. CT = an actively engaged impression. | When both fall in window, click wins — needs a priority rank, not pure recency. |
| Incrementality | Counterfactual: would the conversion have happened anyway? Measured via ghost ads, geo holdouts, PSA controls. | The gap between attribution (correlation) and incrementality (causation) is the senior signal. |
| Quartile completion / VCR | % of plays reaching 25/50/75/100% of the ad. VCR = q100 = video completion rate. | Drives CPCV (cost per completed view) pricing in CTV. Rewinds inflate raw watch_seconds. |
| MRC viewability | IAB/MRC standard: 2 consecutive seconds with ≥50% pixels in view (video). A play with watch_seconds > 0 isn't automatically viewable. | Completion rates ship over viewable plays only, not all plays. |
| IVT (invalid traffic) | Non-human or fraudulent traffic. Filtered by vendors like DoubleVerify, Integral Ad Science. | 1P logs include IVT; 3P reports filter it. Big source of 1P/3P discrepancy. |
| 1P / 3P measurement | 1P = Netflix's own ad server + event pipeline. 3P = DSP, ad server, verification vendor (DV, IAS), audience panel (Nielsen). | The role title. Reconciling discrepancies (timezone, IVT, counting method, dedup) is core daily work. |
| Pacing | Cumulative actual delivery vs cumulative planned delivery over the flight. | Underpacing = at-risk underdelivery. Overpacing = burning budget too fast. |
| CPM / CPCV | Cost per mille (1000 impressions) / cost per completed view. | CTV inventory increasingly priced on CPCV not CPM — completion is the unit of trust. |
Netflix Ads architecture — the real entities
Sourced from Netflix's own engineering posts (Behind the Scenes: Building a Robust Ads Event Processing Pipeline, Introducing Impressions at Netflix) and the Hello Interview Netflix Ad Tracking writeup. If you're asked to design an ads system in R3, these are the blocks the interviewer expects you to name.
R2 / R3 prompts — verbatim from a recent onsite report
From a March 2026 Netflix Ads virtual onsite writeup (5-round breakdown). The SQL screen (R1-ish) was described in adjacent sources as "standard SQL with nothing crazy — aggregations and window functions." The differentiators are R2 (data modeling for ad delivery) and R3 (system design for audience targeting). These are the verbatim prompts to rehearse.
“Design a data model for advertising systems that supports real business scenarios such as ad delivery, performance tracking, and analytics.”
Name the grain of every fact table (one row per WHAT?). Distinguish exposure facts (impression_events) from conversion facts (play_events) from aggregate marts (campaign_daily). Show how the identity graph maps device → member → household for cross-device reach.
“What is the reasoning behind each field in the schema?”
Defend every column: 'event_id is the natural PK for dedup at the consumer; impression_ts is event-time, not ingest-time, because we backfill from late-arriving devices; campaign_id is denormalized for query speed since the campaigns dim is < 100k rows.' Don't ship `created_at` without a reason.
“How can the table structure support future expansion?”
Mention three patterns: (1) `attributes` JSON/MAP column for forward-compatible properties, (2) `event_type` enum on a single events table vs separate tables (trade-off: query speed vs schema rigidity), (3) campaign-hierarchy denormalization (campaign → line_item → creative) for OLAP friendliness.
“What happens when the data volume grows dramatically?”
Partition by event_date + bucket by member_id_hash. HLL sketches for COUNT(DISTINCT) at scale. Cold tier in Iceberg, hot in Druid. Netflix's own write-ups land here: Flink → Druid for sub-second, Iceberg for long-term.
“How would the system support multi-dimensional analytics queries?”
Pre-aggregate to common cuts (campaign × day, campaign × geo × day, campaign × device × day). Roll-up cube limits with audit trails on the cube definitions. For ad-hoc, push down to Druid with bitmap indexes.
“Design an Ads Audience Targeting System: large-scale user uploads, audience matching, high-concurrency delivery, scalable + fault-tolerant.”
Three layers: (1) audience ingestion (advertiser uploads CSV/API of hashed emails → identity resolution), (2) audience indexing (Bloom filters or HLL on member_id sets), (3) match-and-deliver at request time (sub-10ms lookups against precomputed segment membership). Discuss CAP: eventual consistency on segment membership is fine; pricing must be strong-consistency.
Mini case studies — rehearse out loud
Three cases that exercise the canonical ads-measurement reasoning. Read the prompt, talk through your answer out loud (2–3 minutes), then check against the structure. Repeat each one until the framework comes out without referring to the page.
Mini case 1 — Design the ad impressions fact table
An ads-measurement AE owns the `impression_events` fact table. Sketch the schema. Defend the grain. What dimensions hang off it? What's the partition strategy at 1M events/sec? How do you handle late-arriving events?
**Grain**: one row per validated ad impression `(ad_id, member_id, event_ts)`. **Columns**: PK `event_id` (for dedup), FKs to `members`, `campaigns`, `line_items`, `creatives`, `devices`, plus `event_ts` (event-time), `ingest_ts` (clock-skew check), `pod_position`, `slot_within_pod`, `viewable_flag` (MRC 2-sec/50%-pixel), `ivt_flag` (DoubleVerify/IAS), `quartile_reached` (0/25/50/75/100), `watch_seconds`, `country`, `app_version`. **Partition**: `PARTITIONED BY event_date` (Iceberg), bucketed by `member_id` hash for join-locality. **Late-arriving**: Psyberg-style watermark + 48h reconciliation window — facts written 24h ago can change. Downstream marts run with T-2 latency. **Anti-grain**: do NOT mix click events into the same table — different grain (per-click vs per-impression) and different attribution rules.
Mini case 2 — 1P/3P discrepancy investigation
Yesterday's campaign report shows: 1P (Netflix logs) = 10.2M impressions, DSP = 9.4M, IAS viewable = 8.9M. The advertiser is complaining. Walk me through how you'd investigate.
**First, frame**: 1P should be ≥ 3P always — Netflix counts at impression-decision, vendors filter for IVT and viewability downstream. So 10.2M → 9.4M (DSP) → 8.9M (IAS viewable) is the right *direction*. The advertiser's complaint is *magnitude*. **Industry tolerance** is ~10% per AdExchanger / TAG benchmarks; 10.2M → 9.4M is 7.8% gap (within tolerance) but 10.2M → 8.9M is 12.7% (out of tolerance for viewable). **Likely causes ranked**: (1) Counting method — 1P counts at `begin-to-render` (MRC 2014 standard), DSP counts at ad-decision (earlier). (2) IVT filtering asymmetry — IAS filters IVT, 1P doesn't yet. (3) Timestamp drift — DSP stamps UTC, 1P stamps PT; midnight rollover off by 7h. (4) Click dedup — double-click within 1s collapsed by 1P, counted by DSP. **The query**: full outer join on `(campaign_id, date_hour)` to bucket the gap. Reconcile timezone first, then attribute the residual to known causes. **What I'd ship**: a daily discrepancy dashboard with named cause buckets, not just a total gap %.
Mini case 3 — Frequency capping audit query
The campaign brief says 'no more than 3 impressions per member per day.' QA says they're seeing users with 5+ impressions. How do you audit it?
**Clarify first**: calendar day in *whose* timezone? Member-local or UTC? Rolling 24h or calendar? Cross-campaign within the same advertiser? Cross-device? **Then the audit query**: calendar-day variant is `SELECT member_id, DATE(event_ts), COUNT(*) FROM impressions WHERE campaign_id = @c GROUP BY 1,2 HAVING COUNT(*) > 3`. Rolling-24h needs `COUNT(*) OVER (PARTITION BY member_id ORDER BY event_ts RANGE BETWEEN INTERVAL '24 hour' PRECEDING AND CURRENT ROW) > 3`. **Production reality**: caps are enforced in the ad server, not SQL — the SQL is the *audit*, not the enforcement path. If audit shows violations, the cap consumer (Flink job in Netflix's case) is behind on state. **What I'd report**: # violators, average over-exposure depth, time-to-first-violation by member (drift signal — caps are tighter at start, loose at end of flight when the consumer lags).
A note on provenance
Real verbatim Netflix Ads SQL interview prompts are rare in public sources — the candidate corpus is small (the ad tier launched late 2022, in-house ad server fully cut over by mid-2025), and the highest-yield threads are gated on Blind and 1point3acres. The vocabulary, architecture, and case structures here are drawn from Netflix's own engineering posts (which describe the entities and the data-pipeline architecture authoritatively), the March 2026 dev.to onsite writeup (verbatim R2/R3 prompts), and the Hello Interview Netflix Ad Tracking technical study guide. The SQL drills (q21/q22/q23) test the canonical 1P measurement patterns named in the role title, reframed onto the existing pack's impression-events dataset.