Workflow Ops
Pair stage_started ↔ stage_completed, detect in-flight via anti-join, count rework loops.
TL;DR
Studio workflows (ingest → transcode → QC → localize → publish) emit stage_started and stage_completed events into media_workflow_events. Every operations question is some flavor of pairing those two — cycle time, SLA miss rate, rework loops, in-flight detection.
Reach for this pattern when you see "cycle time", "SLA", "in-flight", or "rework". The interview signal is whether you understand that the event stream has separate started and completed rows — and that a stage started twice means rework, not error.
The skeleton
Pair started ↔ completed using FILTER aggregates. Earliest started with earliest completed survives rework loops cleanly.
-- Pattern: pair stage_started ↔ stage_completed safely
WITH paired AS (
SELECT
workflow_id, workflow_stage,
-- Use MIN+FILTER so re-starts (rework) don't break the pair
(MIN(event_timestamp) FILTER (WHERE event_type = 'stage_started'))::TIMESTAMP AS started_at,
(MIN(event_timestamp) FILTER (WHERE event_type = 'stage_completed'))::TIMESTAMP AS completed_at,
(MIN(sla_due_timestamp) FILTER (WHERE event_type = 'stage_started'))::TIMESTAMP AS sla_due
FROM media_workflow_events
GROUP BY workflow_id, workflow_stage
)
SELECT
workflow_stage,
COUNT(*) AS n,
-- Cycle time (q07): percentile_cont over EPOCH diff in hours
ROUND(percentile_cont(0.50) WITHIN GROUP
(ORDER BY EPOCH(completed_at - started_at) / 3600.0), 2) AS p50_hours,
ROUND(percentile_cont(0.90) WITHIN GROUP
(ORDER BY EPOCH(completed_at - started_at) / 3600.0), 2) AS p90_hours,
-- SLA miss rate (q08): completed_at > sla_due
ROUND(AVG(CASE WHEN completed_at > sla_due THEN 1.0 ELSE 0 END), 4) AS miss_rate
FROM paired
WHERE started_at IS NOT NULL
AND completed_at IS NOT NULL
AND completed_at >= started_at
GROUP BY workflow_stage;The trap: "Started" is not "in-flight"
A stage that started and later completed has two rows: a stage_started and a stage_completed. The status column on the started row reflects status at the moment of starting — not the current state. Filtering WHERE event_type = 'stage_started' gives you every stage that ever started, including ones that finished hours ago.
To find truly in-flight stages, anti-join started events against completed events on (workflow_id, workflow_stage).
Wrong
-- ❌ Returns long-completed stages too — the status column is stale
SELECT workflow_id, workflow_stage,
MIN(event_timestamp) AS started_at,
sla_due_timestamp
FROM media_workflow_events
WHERE event_type = 'stage_started'
GROUP BY workflow_id, workflow_stage, sla_due_timestamp;Right
-- ✅ Anti-join: started events that have NO matching completion
WITH starts AS (
SELECT workflow_id, workflow_stage,
MIN(event_timestamp) AS started_at,
MIN(sla_due_timestamp) AS sla_due_timestamp
FROM media_workflow_events
WHERE event_type = 'stage_started'
GROUP BY workflow_id, workflow_stage
),
done AS (
SELECT DISTINCT workflow_id, workflow_stage
FROM media_workflow_events
WHERE event_type = 'stage_completed'
)
SELECT s.*
FROM starts s
LEFT JOIN done d USING (workflow_id, workflow_stage)
WHERE d.workflow_id IS NULL; -- still in-flight