Stop solving each SQL problem from scratch
Every SQL interview question is the same loop — you just don't realize it until you've done 30 of them. This page is the loop, in 5 verbs. Read it once. Then read it again before each challenge. After 5 challenges, the scaffold lives in your head, and every future problem feels like a variation, not a stranger.
The loop, at a glance
| # | Step | In one line |
|---|---|---|
1 | Restate the grain | Say out loud: one row per ___ |
2 | Name the filters | What subset am I computing over? |
3 | Pick the shape | Match the question to a SQL primitive |
4 | List the traps | What edge cases ALWAYS bite this shape? |
5 | Verify | Sanity-check before you submit |
Restate the grain
Say out loud: one row per ___
Before any SQL, name the unit of analysis the question wants. A row per member? per (member, day)? per title? per workflow stage?
This is the single highest-signal move you can make. Juniors start writing joins. Seniors say "one row per (member, sitting), where a sitting means …" first. Once the grain is named, the SQL almost writes itself.
- q01 — "Top 10 titles by hours." Grain =
title - q11 — "Sessionize playback." Grain =
(member, derived session) - q14 — "Cohort retention." Grain =
monthly cohort— but the work happens at(cohort, member)first, then rolls up.
Name the filters
What subset am I computing over?
Most bugs in interview SQL aren't logic bugs — they're filter bugs. The most common: forgetting that playback_events contains impressions and detail views, not just plays. Forgetting is_bot = FALSE. Forgetting that engagement_daily has zero-watch days you don't want to average over.
Before writing, list every filter the question implies — even the obvious ones. "Only play_start. Only this experiment. Only the last 30 days. Exclude bots."
Asking "is bot traffic already filtered upstream?"reads as seven years of experience. Don't assume — name it.
Pick the shape
Match the question to a SQL primitive
A senior carries a small vocabulary of ~10 shapes and matches every new question to one. The pack's 5 pattern guides name the most common ones. Train your ear to hear them:
| When the question says… | Reach for… |
|---|---|
| total / sum / mean / hours-watched | GROUP BY + aggregate |
| consecutive days / streak / habit | gaps-and-islands |
| session / sitting / 30-min gap | LAG + cumulative SUM |
| funnel / impression → play / within N min | self-join with time predicate (aggregate first to dedup) |
| in-flight / never-completed / still-open | anti-join (LEFT JOIN … WHERE NULL) |
| p95 / p50 / tail / worst 5% | percentile_cont WITHIN GROUP |
| dedupe / latest copy | ROW_NUMBER() OVER (PARTITION BY id ORDER BY ts DESC) + WHERE rn = 1 |
| dedupe but keep group flag | above + BOOL_OR(...) OVER (PARTITION BY id) |
| cohort / retention / day-N | flag per member per N, aggregate to cohort (censor incomplete) |
| A/B lift / variant comparison | per-member aggregate, then variant-level AVG |
| chi-square / SRM / hygiene | named-formula: sum of (observed − expected)² / expected |
When you read a new question and can't match it to a shape, you're missing vocabulary — read more pack solutions. When you CAN match it, half the SQL is already written.
List the traps
What edge cases ALWAYS bite this shape?
Every SQL shape has its native traps. Reading dozens of solutions teaches you to anticipate them before writing. The discipline: before you start typing, list 2–3 specific traps you'll watch for.
Verify
Sanity-check before you submit
The grader will tell you yes-or-no. A senior tells themselves yes BEFORE the grader runs — with a 30-second sanity check on the output. Five questions that catch 90% of bugs:
- Row count plausible?If you expected ~15 cohorts and got 31, the censor filter didn't fire.
- Range checks pass? Rates between 0 and 1? Percentages between 0 and 100? Streaks non-negative? Any negative count means something fanned out or got coalesced wrong.
- Identities hold?
COUNT(*) = COUNT(DISTINCT key)for dedup queries.SUM(per_segment) = TOTALfor partition queries. - Direction matches story? Treatment ≥ control for lift. Day-30 retention ≥ Day-90 retention. p90 ≥ p50. If the direction is wrong, you swapped labels or filters somewhere.
- One spot-check.Pick a specific row in your output. Trace it back to the source data manually — does the math check out? If you can't reconstruct one row by hand, you don't yet trust your query.
In an interview, the verification narration IS the interview. The interviewer is watching you decide whether to trust your own output. Talking through these checks out loud signals exactly the seniority bar they're grading on.
How to use this when solving a challenge
Every challenge in this pack has an Aprofundar tab with these exact 5 numbered steps, in this exact order: 1. Restate the grain · 2. Name the filters · 3. Pick the shape · 4. List the traps · 5. Verify. Same scaffold, different content, every time. After 5 challenges you'll start anticipating each step before you scroll to it — that's the loop living in your head. (A separate "Interview narration" footer shows you how to verbalize the work in the room.)