Pattern6 challenges

Engagement Metrics

Aggregate playback with caps, percentiles, and time windows — without falling for rewatch inflation.

TL;DR

Aggregate playback_events over a time window and produce a metric — total hours, completion rate, rebuffer ratio, startup p95. Every one of these questions is a WHERE + GROUP BY + an aggregate function on the same event stream.

Reach for this pattern when the question reads "most-watched X," "completion rate," "quality-of-experience by Y," or "p95 / p99 of Z." The challenge isn't the SQL shape — it's knowing which filter to apply, which cap to enforce, and which aggregate truly answers the metric without inflation.

The skeleton

Three slots to fill: the time window, the filter (event_name + entry_source if needed), and the aggregate. Cap watch_seconds at runtime if rewatch inflation matters.

-- Pattern: engagement metric over a time window
WITH bounded AS (
  SELECT
    p.<< DIMENSION >>,                          -- title_id, country, device_type, ...
    -- For valued-hours questions, cap rewatch:
    LEAST(p.watch_seconds, t.runtime_minutes * 60) AS valued_seconds,
    p.watch_seconds,
    p.rebuffer_seconds,
    p.startup_time_ms
  FROM playback_events p
  JOIN titles t USING (title_id)               -- drop if you don't need runtime
  WHERE p.event_name = 'play_start'            -- 'play_start' is the canonical filter
    -- Anchor windows to the data's MAX, not CURRENT_DATE — tests are deterministic:
    AND p.event_ts >= ((SELECT MAX(event_ts) FROM playback_events)::TIMESTAMP
                       - INTERVAL << N >> DAY)
)
SELECT
  << DIMENSION >>,
  -- Pick ONE aggregate that answers the question:
  ROUND(SUM(watch_seconds) / 3600.0, 2)                                  AS hours,
  ROUND(SUM(valued_seconds) / 3600.0, 2)                                 AS valued_hours,
  ROUND(SUM(rebuffer_seconds) * 1.0 / NULLIF(SUM(watch_seconds), 0), 4) AS rebuffer_ratio,
  percentile_cont(0.95) WITHIN GROUP (ORDER BY startup_time_ms)          AS p95_startup_ms,
  AVG(CASE WHEN valued_seconds >= 0.9 * /*runtime_seconds*/ ...
           THEN 1.0 ELSE 0 END)                                          AS completion_rate
FROM bounded
GROUP BY << DIMENSION >>
ORDER BY << DIMENSION >>;

The trap: Rewatch inflation

A member loops the first 6 minutes of a 60-minute show ten times. Naïve SUM(watch_seconds) says they watched 60 minutes and naive completion logic marks them as a finisher. They never finished.

Cap each play at the title's runtime beforesumming — and cap the per-(member, title) sum again, so multiple plays covering different segments can collectively reach the runtime but rewatchers cannot. This is exactly how Netflix's public Top-10 "views" metric works.

Wrong

-- ❌ Rewatchers inflate the total
SELECT title_id,
       SUM(watch_seconds) AS hours_secs
FROM playback_events
WHERE event_name = 'play_start'
GROUP BY title_id;

Right

-- ✅ Cap each play at runtime, then cap the sum again
WITH per_play AS (
  SELECT member_id, title_id,
         LEAST(watch_seconds, t.runtime_minutes * 60) AS valued
  FROM playback_events p
  JOIN titles t USING (title_id)
  WHERE event_name = 'play_start'
),
per_member_title AS (
  SELECT member_id, title_id,
         LEAST(SUM(valued), MAX(t.runtime_minutes) * 60) AS valued
  FROM per_play
  JOIN titles t USING (title_id)
  GROUP BY member_id, title_id
)
SELECT title_id, SUM(valued) AS valued_seconds
FROM per_member_title
GROUP BY title_id;