Framework5 steps · ~5 min read

Stop solving each SQL problem from scratch

Every SQL interview question is the same loop — you just don't realize it until you've done 30 of them. This page is the loop, in 5 verbs. Read it once. Then read it again before each challenge. After 5 challenges, the scaffold lives in your head, and every future problem feels like a variation, not a stranger.

The loop, at a glance

#StepIn one line
1
Restate the grainSay out loud: one row per ___
2
Name the filtersWhat subset am I computing over?
3
Pick the shapeMatch the question to a SQL primitive
4
List the trapsWhat edge cases ALWAYS bite this shape?
5
VerifySanity-check before you submit
1

Restate the grain

Say out loud: one row per ___

Before any SQL, name the unit of analysis the question wants. A row per member? per (member, day)? per title? per workflow stage?

This is the single highest-signal move you can make. Juniors start writing joins. Seniors say "one row per (member, sitting), where a sitting means …" first. Once the grain is named, the SQL almost writes itself.

Try it on these pack questions
  • q01 — "Top 10 titles by hours." Grain = title
  • q11 — "Sessionize playback." Grain = (member, derived session)
  • q14 — "Cohort retention." Grain = monthly cohort — but the work happens at (cohort, member) first, then rolls up.
2

Name the filters

What subset am I computing over?

Most bugs in interview SQL aren't logic bugs — they're filter bugs. The most common: forgetting that playback_events contains impressions and detail views, not just plays. Forgetting is_bot = FALSE. Forgetting that engagement_daily has zero-watch days you don't want to average over.

Before writing, list every filter the question implies — even the obvious ones. "Only play_start. Only this experiment. Only the last 30 days. Exclude bots."

Senior signal

Asking "is bot traffic already filtered upstream?"reads as seven years of experience. Don't assume — name it.

3

Pick the shape

Match the question to a SQL primitive

A senior carries a small vocabulary of ~10 shapes and matches every new question to one. The pack's 5 pattern guides name the most common ones. Train your ear to hear them:

When the question says…Reach for…
total / sum / mean / hours-watchedGROUP BY + aggregate
consecutive days / streak / habitgaps-and-islands
session / sitting / 30-min gapLAG + cumulative SUM
funnel / impression → play / within N minself-join with time predicate (aggregate first to dedup)
in-flight / never-completed / still-openanti-join (LEFT JOIN … WHERE NULL)
p95 / p50 / tail / worst 5%percentile_cont WITHIN GROUP
dedupe / latest copyROW_NUMBER() OVER (PARTITION BY id ORDER BY ts DESC) + WHERE rn = 1
dedupe but keep group flagabove + BOOL_OR(...) OVER (PARTITION BY id)
cohort / retention / day-Nflag per member per N, aggregate to cohort (censor incomplete)
A/B lift / variant comparisonper-member aggregate, then variant-level AVG
chi-square / SRM / hygienenamed-formula: sum of (observed − expected)² / expected

When you read a new question and can't match it to a shape, you're missing vocabulary — read more pack solutions. When you CAN match it, half the SQL is already written.

4

List the traps

What edge cases ALWAYS bite this shape?

Every SQL shape has its native traps. Reading dozens of solutions teaches you to anticipate them before writing. The discipline: before you start typing, list 2–3 specific traps you'll watch for.

Fan-out from multi-match joins
Funnels, time-window self-joins. One impression matching 3 plays counts 3 times unless you aggregate first.
NULL → silent skip in AVG / SUM
LEFT JOIN with no match returns NULL. AVG ignores NULL; SUM returns NULL on all-NULL group. COALESCE early.
Same-timestamp ties
ORDER BY event_ts alone is non-deterministic when 2 events share a ms. Tie-break: (event_ts, event_id).
>= vs >
Boundary semantics matter. 30 min apart: same session or new? 90 days exactly: censor or include? Read the spec.
Integer division
SUM(int)/SUM(int) = int in some engines. Cast to DOUBLE or multiply by 1.0.
Censored cohorts
Members who signed up < N days ago can't have Day-N retention. Filter them out or you'll report a fake churn spike.
BIGINT result of COUNT
DuckDB-WASM returns BIGINT as JS BigInt — can't JSON.serialize. Cast to INT in the final SELECT if a downstream tool needs it.
TIMESTAMPTZ → DATE in WASM
Direct cast doesn't compile in DuckDB-WASM. Use ::TIMESTAMP::DATE intermediate cast (see /datasets gotcha #3).
5

Verify

Sanity-check before you submit

The grader will tell you yes-or-no. A senior tells themselves yes BEFORE the grader runs — with a 30-second sanity check on the output. Five questions that catch 90% of bugs:

  1. Row count plausible?If you expected ~15 cohorts and got 31, the censor filter didn't fire.
  2. Range checks pass? Rates between 0 and 1? Percentages between 0 and 100? Streaks non-negative? Any negative count means something fanned out or got coalesced wrong.
  3. Identities hold? COUNT(*) = COUNT(DISTINCT key) for dedup queries. SUM(per_segment) = TOTAL for partition queries.
  4. Direction matches story? Treatment ≥ control for lift. Day-30 retention ≥ Day-90 retention. p90 ≥ p50. If the direction is wrong, you swapped labels or filters somewhere.
  5. One spot-check.Pick a specific row in your output. Trace it back to the source data manually — does the math check out? If you can't reconstruct one row by hand, you don't yet trust your query.
Why this step matters most

In an interview, the verification narration IS the interview. The interviewer is watching you decide whether to trust your own output. Talking through these checks out loud signals exactly the seniority bar they're grading on.

How to use this when solving a challenge

Every challenge in this pack has an Aprofundar tab with these exact 5 numbered steps, in this exact order: 1. Restate the grain · 2. Name the filters · 3. Pick the shape · 4. List the traps · 5. Verify. Same scaffold, different content, every time. After 5 challenges you'll start anticipating each step before you scroll to it — that's the loop living in your head. (A separate "Interview narration" footer shows you how to verbalize the work in the room.)