Pattern3 challenges

Funnels & Sessions

Stitch impression → detail → play with attribution windows; derive sessions from event streams.

TL;DR

Two related shapes. Funnels stitch a sequence of events (impression → detail → play) for the same user within an attribution window. Sessionspartition a single user's events into bursts separated by inactivity gaps. Both lean on self-joins over playback_events with time-window predicates.

Reach for this pattern when you see "rate of A → B", "within N minutes", "same session", or "time to X". The senior-interview signal is whether you control for fan-out: a naïve join can match one impression to many downstream events and triple-count it.

The skeleton

Funnel skeleton: aggregate to one row per impression-key (member, title, session) before joining downstream events. Sessionization skeleton lives further down inside q11's solution — same shape with LAG + gap > 30 min.

-- Funnel: A → B within an attribution window
WITH base AS (
  -- 1) Dedup to one row per impression-key. MIN(event_ts) is the impression ts.
  SELECT member_id, title_id, session_id,
         MIN(event_ts)::TIMESTAMP AS imp_ts
  FROM playback_events
  WHERE entry_source = '<< SURFACE >>'         -- e.g. 'homepage_row'
    AND event_name   = 'impression'
  GROUP BY member_id, title_id, session_id
),
next_step AS (
  -- 2) Find earliest downstream event within the window, then aggregate to
  --    impression-key. Aggregating BEFORE counting is what kills fan-out.
  SELECT b.member_id, b.title_id, b.session_id,
         MIN(e.event_ts)::TIMESTAMP AS step_ts
  FROM base b
  JOIN playback_events e
    ON e.member_id  = b.member_id
   AND e.title_id   = b.title_id
   AND e.session_id = b.session_id
   AND e.event_name = '<< NEXT_EVENT >>'        -- e.g. 'title_detail_view'
   AND e.event_ts::TIMESTAMP BETWEEN b.imp_ts
                                  AND b.imp_ts + INTERVAL '<< N >>' MINUTE
  GROUP BY b.member_id, b.title_id, b.session_id
)
SELECT
  COUNT(*)                              AS impressions,
  COUNT(n.member_id)                    AS converted,
  ROUND(100.0 * COUNT(n.member_id)
              / NULLIF(COUNT(*), 0), 2) AS conversion_pct
FROM base b
LEFT JOIN next_step n USING (member_id, title_id, session_id);

The trap: Fan-out from multi-match joins

One impression event can match multiple downstream events that satisfy the time window. If you join and then COUNT(*) the result, you count that impression once per downstream event— conversion rates climb past 100% and you can't reproduce them.

The fix: aggregate before counting. Collapse each side of the join to one row per impression-key with MIN(event_ts) first. Now the join is impression-to-key, not impression-to-event-stream, and conversion counts make sense.

Wrong

-- ❌ Fan-out: one impression with 3 matching plays counts as 3
SELECT COUNT(*) FILTER (WHERE p.event_name = 'play_start')
       / COUNT(*) FILTER (WHERE i.event_name = 'impression') AS rate
FROM playback_events i
JOIN playback_events p
  ON p.member_id = i.member_id
 AND p.title_id  = i.title_id
 AND p.event_ts BETWEEN i.event_ts AND i.event_ts + INTERVAL 30 MINUTE
WHERE i.event_name = 'impression';

Right

-- ✅ Aggregate impressions and plays to one row per key, then compare
WITH imps AS (
  SELECT member_id, title_id, session_id,
         MIN(event_ts)::TIMESTAMP AS imp_ts
  FROM playback_events
  WHERE event_name = 'impression'
  GROUP BY member_id, title_id, session_id
),
plays AS (
  SELECT i.member_id, i.title_id, i.session_id
  FROM imps i
  JOIN playback_events p
    ON p.member_id = i.member_id
   AND p.title_id  = i.title_id
   AND p.session_id = i.session_id
   AND p.event_name = 'play_start'
   AND p.event_ts::TIMESTAMP BETWEEN i.imp_ts
                                  AND i.imp_ts + INTERVAL 30 MINUTE
  GROUP BY i.member_id, i.title_id, i.session_id
)
SELECT COUNT(*) AS impressions,
       (SELECT COUNT(*) FROM plays) AS plays;