Experimentation
Variant joins, SRM checks, ITT vs TOT, guardrail metrics — the experiment platform questions.
TL;DR
Four shapes appear on the Netflix Experimentation Platform screen: lift (mean metric per variant over a fixed post-assignment window), SRM (was the variant split actually 50/50?), ITT vs TOT (analyze everyone assigned vs. only those actually exposed), and guardrails (did the win on metric A cost us metric B?). All four join experiment_assignments to a behavior table.
Reach for this pattern when you see "A/B", "variant", "lift", "treatment vs control", "SRM", "ITT", or "guardrail." The senior signal is whether you understand the difference between ITT and TOT and pick the right one for the question.
The skeleton
7-day lift skeleton: per-member sum of the metric across the assignment+6-day window, then mean by variant. Swap the metric, the join table, and the window — same skeleton.
-- Pattern: 7-day post-assignment lift, per variant
WITH per_member AS (
SELECT
ea.member_id,
ea.variant,
SUM(ed.watch_seconds) AS metric_7d -- swap for any metric
FROM experiment_assignments ea
JOIN engagement_daily ed
ON ed.member_id = ea.member_id
AND ed.date BETWEEN ea.assignment_timestamp::DATE
AND ea.assignment_timestamp::DATE + INTERVAL 6 DAY
WHERE ea.experiment_id = '<< EXPERIMENT_ID >>'
-- TOT? add: AND ea.exposed_flag = TRUE
-- ITT? leave it out — analyze everyone assigned
GROUP BY ea.member_id, ea.variant
)
SELECT
variant,
COUNT(*) AS n_members,
ROUND(AVG(metric_7d), 2) AS mean_metric_per_member,
-- Lift vs control:
ROUND(AVG(metric_7d)
/ (SELECT AVG(metric_7d) FROM per_member WHERE variant = 'control')
- 1, 4) AS lift_pct
FROM per_member
GROUP BY variant
ORDER BY variant;
-- SRM check (q18): expected 50/50 split is binomial; chi-square one-liner:
-- SUM((observed - expected)^2 / expected) across variants. Flag p < 0.001.The trap: Confusing ITT with TOT
ITT (intent-to-treat) counts everyone assignedto a variant — including assignees who never saw the treatment surface. It's the unbiased estimator: assignment is random, exposure isn't.
TOT (treatment-on-the-treated) restricts to exposed_flag = TRUE. It's biased upward because exposure correlates with engagement — unexposed users tend to be less engaged in the first place. Reporting TOT lift as if it were ITT is the senior-interview red flag.
Pick by the question. Decision-making? ITT. Estimating treatment effect on actually-treated population? TOT, and disclose the selection bias.
Wrong
-- ❌ Reporting TOT as if it were ITT — selection bias inflates the lift
SELECT variant, AVG(metric_7d) AS mean_per_member
FROM experiment_assignments ea
JOIN engagement_daily ed ON ed.member_id = ea.member_id
AND ed.date BETWEEN ea.assignment_timestamp::DATE
AND ea.assignment_timestamp::DATE + 6
WHERE ea.experiment_id = 'homepage_ranking_v2'
AND ea.exposed_flag = TRUE -- TOT only!
GROUP BY variant;Right
-- ✅ Report both, labeled, so the reader sees the gap
WITH per_member AS (
SELECT ea.member_id, ea.variant, ea.exposed_flag,
SUM(ed.watch_seconds) AS metric_7d
FROM experiment_assignments ea
JOIN engagement_daily ed ON ed.member_id = ea.member_id
AND ed.date BETWEEN ea.assignment_timestamp::DATE
AND ea.assignment_timestamp::DATE + 6
WHERE ea.experiment_id = 'homepage_ranking_v2'
GROUP BY ea.member_id, ea.variant, ea.exposed_flag
)
SELECT variant,
AVG(metric_7d) FILTER (WHERE TRUE) AS itt_mean,
AVG(metric_7d) FILTER (WHERE exposed_flag) AS tot_mean,
COUNT(*) FILTER (WHERE TRUE) AS itt_n,
COUNT(*) FILTER (WHERE exposed_flag) AS tot_n
FROM per_member
GROUP BY variant;