Retention & Cohorts
Gaps-and-islands streaks, monthly signup cohorts, multi-signal churn risk — censoring the incomplete.
TL;DR
Daily roll-ups in engagement_daily are the spine. Three shapes recur: streaks (consecutive-day runs via gaps-and-islands), cohort retention (group members by signup month, measure Day-N return), and risk scoring (combine recency + frequency + tenure into a churn flag).
Reach for this pattern when you see "7-day", "consecutive", "Day-30/60/90", "cohort", or "churn risk". The senior signal is whether you censor incomplete cohorts— measuring Day-90 on members who only have 30 days of data is the classic "churn spiked!" false alarm.
The skeleton
Two skeletons in one: gaps-and-islands for streaks, and a date-windowed retention check for cohorts. Both lean on engagement_daily.
-- (1) Streaks via gaps-and-islands (q13)
WITH active_days AS (
SELECT member_id, date
FROM engagement_daily
WHERE active_viewing_flag = TRUE
),
grouped AS (
-- Subtract row_number-in-days from the date.
-- Consecutive days share the same streak_grp.
SELECT member_id, date,
date - INTERVAL (
ROW_NUMBER() OVER (PARTITION BY member_id ORDER BY date)
) DAY AS streak_grp
FROM active_days
)
SELECT member_id,
COUNT(*) AS streak_days,
MIN(date) AS streak_start,
MAX(date) AS streak_end
FROM grouped
GROUP BY member_id, streak_grp
HAVING COUNT(*) >= << N >>;
-- (2) Day-N retention by signup cohort (q14)
WITH cutoff AS (SELECT MAX(date) AS max_date FROM engagement_daily),
eligible AS (
-- CENSORING: only cohorts old enough to have full N days of data
SELECT member_id, signup_date,
DATE_TRUNC('month', signup_date)::DATE AS cohort_month
FROM members, cutoff
WHERE signup_date + INTERVAL << N >> DAY <= max_date
AND is_bot = FALSE
)
SELECT cohort_month,
COUNT(*) AS cohort_size,
SUM(CASE
WHEN EXISTS (
SELECT 1 FROM engagement_daily ed
WHERE ed.member_id = eligible.member_id
AND ed.active_viewing_flag
AND ed.date BETWEEN signup_date + INTERVAL (<< N >> - 6) DAY
AND signup_date + INTERVAL (<< N >> + 1) DAY
) THEN 1 ELSE 0
END) AS retained_dN
FROM eligible
GROUP BY cohort_month
ORDER BY cohort_month;The trap: Failing to censor incomplete cohorts
Day-90 retention for the cohort that signed up two weeks agois NULL by construction — they don't havea Day-90 yet. Naïve aggregation treats those NULLs as zero, drags the cohort's retention rate down, and you report "churn spiked!" when really the cohort is just too young to measure.
Censor: only include cohorts where signup_date + N days ≤ MAX(date). Same logic for streaks — don't flag "short streak" for members whose active window doesn't reach the threshold yet.
Wrong
-- ❌ Recent cohorts have NULL Day-90 → "churn spiked" false alarm
SELECT DATE_TRUNC('month', signup_date) AS cohort_month,
AVG(d90_retained::INT) AS d90_rate
FROM members m
LEFT JOIN cohort_membership_d90 USING (member_id)
GROUP BY cohort_month;Right
-- ✅ Filter cohorts to those with full N days of post-signup data
WITH cutoff AS (SELECT MAX(date) AS max_date FROM engagement_daily)
SELECT DATE_TRUNC('month', m.signup_date)::DATE AS cohort_month,
AVG(d90_retained::INT) AS d90_rate
FROM members m, cutoff c
WHERE m.signup_date + INTERVAL 90 DAY <= c.max_date -- censoring
GROUP BY cohort_month;