Pattern4 challenges

Workflow Ops

Pair stage_started ↔ stage_completed, detect in-flight via anti-join, count rework loops.

TL;DR

Studio workflows (ingest → transcode → QC → localize → publish) emit stage_started and stage_completed events into media_workflow_events. Every operations question is some flavor of pairing those two — cycle time, SLA miss rate, rework loops, in-flight detection.

Reach for this pattern when you see "cycle time", "SLA", "in-flight", or "rework". The interview signal is whether you understand that the event stream has separate started and completed rows — and that a stage started twice means rework, not error.

The skeleton

Pair started ↔ completed using FILTER aggregates. Earliest started with earliest completed survives rework loops cleanly.

-- Pattern: pair stage_started ↔ stage_completed safely
WITH paired AS (
  SELECT
    workflow_id, workflow_stage,
    -- Use MIN+FILTER so re-starts (rework) don't break the pair
    (MIN(event_timestamp) FILTER (WHERE event_type = 'stage_started'))::TIMESTAMP   AS started_at,
    (MIN(event_timestamp) FILTER (WHERE event_type = 'stage_completed'))::TIMESTAMP AS completed_at,
    (MIN(sla_due_timestamp) FILTER (WHERE event_type = 'stage_started'))::TIMESTAMP AS sla_due
  FROM media_workflow_events
  GROUP BY workflow_id, workflow_stage
)
SELECT
  workflow_stage,
  COUNT(*)                                                            AS n,
  -- Cycle time (q07): percentile_cont over EPOCH diff in hours
  ROUND(percentile_cont(0.50) WITHIN GROUP
        (ORDER BY EPOCH(completed_at - started_at) / 3600.0), 2)      AS p50_hours,
  ROUND(percentile_cont(0.90) WITHIN GROUP
        (ORDER BY EPOCH(completed_at - started_at) / 3600.0), 2)      AS p90_hours,
  -- SLA miss rate (q08): completed_at > sla_due
  ROUND(AVG(CASE WHEN completed_at > sla_due THEN 1.0 ELSE 0 END), 4) AS miss_rate
FROM paired
WHERE started_at IS NOT NULL
  AND completed_at IS NOT NULL
  AND completed_at >= started_at
GROUP BY workflow_stage;

The trap: "Started" is not "in-flight"

A stage that started and later completed has two rows: a stage_started and a stage_completed. The status column on the started row reflects status at the moment of starting — not the current state. Filtering WHERE event_type = 'stage_started' gives you every stage that ever started, including ones that finished hours ago.

To find truly in-flight stages, anti-join started events against completed events on (workflow_id, workflow_stage).

Wrong

-- ❌ Returns long-completed stages too — the status column is stale
SELECT workflow_id, workflow_stage,
       MIN(event_timestamp) AS started_at,
       sla_due_timestamp
FROM media_workflow_events
WHERE event_type = 'stage_started'
GROUP BY workflow_id, workflow_stage, sla_due_timestamp;

Right

-- ✅ Anti-join: started events that have NO matching completion
WITH starts AS (
  SELECT workflow_id, workflow_stage,
         MIN(event_timestamp) AS started_at,
         MIN(sla_due_timestamp) AS sla_due_timestamp
  FROM media_workflow_events
  WHERE event_type = 'stage_started'
  GROUP BY workflow_id, workflow_stage
),
done AS (
  SELECT DISTINCT workflow_id, workflow_stage
  FROM media_workflow_events
  WHERE event_type = 'stage_completed'
)
SELECT s.*
FROM starts s
LEFT JOIN done d USING (workflow_id, workflow_stage)
WHERE d.workflow_id IS NULL;            -- still in-flight