Pattern3 challenges

Retention & Cohorts

Gaps-and-islands streaks, monthly signup cohorts, multi-signal churn risk — censoring the incomplete.

TL;DR

Daily roll-ups in engagement_daily are the spine. Three shapes recur: streaks (consecutive-day runs via gaps-and-islands), cohort retention (group members by signup month, measure Day-N return), and risk scoring (combine recency + frequency + tenure into a churn flag).

Reach for this pattern when you see "7-day", "consecutive", "Day-30/60/90", "cohort", or "churn risk". The senior signal is whether you censor incomplete cohorts— measuring Day-90 on members who only have 30 days of data is the classic "churn spiked!" false alarm.

The skeleton

Two skeletons in one: gaps-and-islands for streaks, and a date-windowed retention check for cohorts. Both lean on engagement_daily.

-- (1) Streaks via gaps-and-islands (q13)
WITH active_days AS (
  SELECT member_id, date
  FROM engagement_daily
  WHERE active_viewing_flag = TRUE
),
grouped AS (
  -- Subtract row_number-in-days from the date.
  -- Consecutive days share the same streak_grp.
  SELECT member_id, date,
         date - INTERVAL (
           ROW_NUMBER() OVER (PARTITION BY member_id ORDER BY date)
         ) DAY AS streak_grp
  FROM active_days
)
SELECT member_id,
       COUNT(*)  AS streak_days,
       MIN(date) AS streak_start,
       MAX(date) AS streak_end
FROM grouped
GROUP BY member_id, streak_grp
HAVING COUNT(*) >= << N >>;

-- (2) Day-N retention by signup cohort (q14)
WITH cutoff AS (SELECT MAX(date) AS max_date FROM engagement_daily),
eligible AS (
  -- CENSORING: only cohorts old enough to have full N days of data
  SELECT member_id, signup_date,
         DATE_TRUNC('month', signup_date)::DATE AS cohort_month
  FROM members, cutoff
  WHERE signup_date + INTERVAL << N >> DAY <= max_date
    AND is_bot = FALSE
)
SELECT cohort_month,
       COUNT(*) AS cohort_size,
       SUM(CASE
             WHEN EXISTS (
               SELECT 1 FROM engagement_daily ed
               WHERE ed.member_id = eligible.member_id
                 AND ed.active_viewing_flag
                 AND ed.date BETWEEN signup_date + INTERVAL (<< N >> - 6) DAY
                                 AND signup_date + INTERVAL (<< N >> + 1) DAY
             ) THEN 1 ELSE 0
           END) AS retained_dN
FROM eligible
GROUP BY cohort_month
ORDER BY cohort_month;

The trap: Failing to censor incomplete cohorts

Day-90 retention for the cohort that signed up two weeks agois NULL by construction — they don't havea Day-90 yet. Naïve aggregation treats those NULLs as zero, drags the cohort's retention rate down, and you report "churn spiked!" when really the cohort is just too young to measure.

Censor: only include cohorts where signup_date + N days ≤ MAX(date). Same logic for streaks — don't flag "short streak" for members whose active window doesn't reach the threshold yet.

Wrong

-- ❌ Recent cohorts have NULL Day-90 → "churn spiked" false alarm
SELECT DATE_TRUNC('month', signup_date) AS cohort_month,
       AVG(d90_retained::INT)           AS d90_rate
FROM members m
LEFT JOIN cohort_membership_d90 USING (member_id)
GROUP BY cohort_month;

Right

-- ✅ Filter cohorts to those with full N days of post-signup data
WITH cutoff AS (SELECT MAX(date) AS max_date FROM engagement_daily)
SELECT DATE_TRUNC('month', m.signup_date)::DATE AS cohort_month,
       AVG(d90_retained::INT)                   AS d90_rate
FROM members m, cutoff c
WHERE m.signup_date + INTERVAL 90 DAY <= c.max_date     -- censoring
GROUP BY cohort_month;