Engagement Metrics
Aggregate playback with caps, percentiles, and time windows — without falling for rewatch inflation.
TL;DR
Aggregate playback_events over a time window and produce a metric — total hours, completion rate, rebuffer ratio, startup p95. Every one of these questions is a WHERE + GROUP BY + an aggregate function on the same event stream.
Reach for this pattern when the question reads "most-watched X," "completion rate," "quality-of-experience by Y," or "p95 / p99 of Z." The challenge isn't the SQL shape — it's knowing which filter to apply, which cap to enforce, and which aggregate truly answers the metric without inflation.
The skeleton
Three slots to fill: the time window, the filter (event_name + entry_source if needed), and the aggregate. Cap watch_seconds at runtime if rewatch inflation matters.
-- Pattern: engagement metric over a time window
WITH bounded AS (
SELECT
p.<< DIMENSION >>, -- title_id, country, device_type, ...
-- For valued-hours questions, cap rewatch:
LEAST(p.watch_seconds, t.runtime_minutes * 60) AS valued_seconds,
p.watch_seconds,
p.rebuffer_seconds,
p.startup_time_ms
FROM playback_events p
JOIN titles t USING (title_id) -- drop if you don't need runtime
WHERE p.event_name = 'play_start' -- 'play_start' is the canonical filter
-- Anchor windows to the data's MAX, not CURRENT_DATE — tests are deterministic:
AND p.event_ts >= ((SELECT MAX(event_ts) FROM playback_events)::TIMESTAMP
- INTERVAL << N >> DAY)
)
SELECT
<< DIMENSION >>,
-- Pick ONE aggregate that answers the question:
ROUND(SUM(watch_seconds) / 3600.0, 2) AS hours,
ROUND(SUM(valued_seconds) / 3600.0, 2) AS valued_hours,
ROUND(SUM(rebuffer_seconds) * 1.0 / NULLIF(SUM(watch_seconds), 0), 4) AS rebuffer_ratio,
percentile_cont(0.95) WITHIN GROUP (ORDER BY startup_time_ms) AS p95_startup_ms,
AVG(CASE WHEN valued_seconds >= 0.9 * /*runtime_seconds*/ ...
THEN 1.0 ELSE 0 END) AS completion_rate
FROM bounded
GROUP BY << DIMENSION >>
ORDER BY << DIMENSION >>;The trap: Rewatch inflation
A member loops the first 6 minutes of a 60-minute show ten times. Naïve SUM(watch_seconds) says they watched 60 minutes and naive completion logic marks them as a finisher. They never finished.
Cap each play at the title's runtime beforesumming — and cap the per-(member, title) sum again, so multiple plays covering different segments can collectively reach the runtime but rewatchers cannot. This is exactly how Netflix's public Top-10 "views" metric works.
Wrong
-- ❌ Rewatchers inflate the total
SELECT title_id,
SUM(watch_seconds) AS hours_secs
FROM playback_events
WHERE event_name = 'play_start'
GROUP BY title_id;Right
-- ✅ Cap each play at runtime, then cap the sum again
WITH per_play AS (
SELECT member_id, title_id,
LEAST(watch_seconds, t.runtime_minutes * 60) AS valued
FROM playback_events p
JOIN titles t USING (title_id)
WHERE event_name = 'play_start'
),
per_member_title AS (
SELECT member_id, title_id,
LEAST(SUM(valued), MAX(t.runtime_minutes) * 60) AS valued
FROM per_play
JOIN titles t USING (title_id)
GROUP BY member_id, title_id
)
SELECT title_id, SUM(valued) AS valued_seconds
FROM per_member_title
GROUP BY title_id;