Pattern4 challenges

Experimentation

Variant joins, SRM checks, ITT vs TOT, guardrail metrics — the experiment platform questions.

TL;DR

Four shapes appear on the Netflix Experimentation Platform screen: lift (mean metric per variant over a fixed post-assignment window), SRM (was the variant split actually 50/50?), ITT vs TOT (analyze everyone assigned vs. only those actually exposed), and guardrails (did the win on metric A cost us metric B?). All four join experiment_assignments to a behavior table.

Reach for this pattern when you see "A/B", "variant", "lift", "treatment vs control", "SRM", "ITT", or "guardrail." The senior signal is whether you understand the difference between ITT and TOT and pick the right one for the question.

The skeleton

7-day lift skeleton: per-member sum of the metric across the assignment+6-day window, then mean by variant. Swap the metric, the join table, and the window — same skeleton.

-- Pattern: 7-day post-assignment lift, per variant
WITH per_member AS (
  SELECT
    ea.member_id,
    ea.variant,
    SUM(ed.watch_seconds) AS metric_7d          -- swap for any metric
  FROM experiment_assignments ea
  JOIN engagement_daily ed
    ON ed.member_id = ea.member_id
   AND ed.date BETWEEN ea.assignment_timestamp::DATE
                   AND ea.assignment_timestamp::DATE + INTERVAL 6 DAY
  WHERE ea.experiment_id = '<< EXPERIMENT_ID >>'
    -- TOT? add: AND ea.exposed_flag = TRUE
    -- ITT? leave it out — analyze everyone assigned
  GROUP BY ea.member_id, ea.variant
)
SELECT
  variant,
  COUNT(*)               AS n_members,
  ROUND(AVG(metric_7d), 2) AS mean_metric_per_member,
  -- Lift vs control:
  ROUND(AVG(metric_7d)
    / (SELECT AVG(metric_7d) FROM per_member WHERE variant = 'control')
    - 1, 4) AS lift_pct
FROM per_member
GROUP BY variant
ORDER BY variant;

-- SRM check (q18): expected 50/50 split is binomial; chi-square one-liner:
-- SUM((observed - expected)^2 / expected) across variants. Flag p < 0.001.

The trap: Confusing ITT with TOT

ITT (intent-to-treat) counts everyone assignedto a variant — including assignees who never saw the treatment surface. It's the unbiased estimator: assignment is random, exposure isn't.

TOT (treatment-on-the-treated) restricts to exposed_flag = TRUE. It's biased upward because exposure correlates with engagement — unexposed users tend to be less engaged in the first place. Reporting TOT lift as if it were ITT is the senior-interview red flag.

Pick by the question. Decision-making? ITT. Estimating treatment effect on actually-treated population? TOT, and disclose the selection bias.

Wrong

-- ❌ Reporting TOT as if it were ITT — selection bias inflates the lift
SELECT variant, AVG(metric_7d) AS mean_per_member
FROM experiment_assignments ea
JOIN engagement_daily ed ON ed.member_id = ea.member_id
                        AND ed.date BETWEEN ea.assignment_timestamp::DATE
                                       AND ea.assignment_timestamp::DATE + 6
WHERE ea.experiment_id = 'homepage_ranking_v2'
  AND ea.exposed_flag = TRUE                  -- TOT only!
GROUP BY variant;

Right

-- ✅ Report both, labeled, so the reader sees the gap
WITH per_member AS (
  SELECT ea.member_id, ea.variant, ea.exposed_flag,
         SUM(ed.watch_seconds) AS metric_7d
  FROM experiment_assignments ea
  JOIN engagement_daily ed ON ed.member_id = ea.member_id
                          AND ed.date BETWEEN ea.assignment_timestamp::DATE
                                         AND ea.assignment_timestamp::DATE + 6
  WHERE ea.experiment_id = 'homepage_ranking_v2'
  GROUP BY ea.member_id, ea.variant, ea.exposed_flag
)
SELECT variant,
       AVG(metric_7d) FILTER (WHERE TRUE)              AS itt_mean,
       AVG(metric_7d) FILTER (WHERE exposed_flag)      AS tot_mean,
       COUNT(*) FILTER (WHERE TRUE)                    AS itt_n,
       COUNT(*) FILTER (WHERE exposed_flag)            AS tot_n
FROM per_member
GROUP BY variant;