Human-in-the-Loop Design Patterns for Automation and AI

Overview

Human-in-the-loop (HITL) is not “we have a person somewhere.” It is explicit design: which decisions require humans, what information they see, SLA for review, and how feedback improves models and rules.

Quick definition

Human-in-the-loop automation defines explicit review queues with SLA, presents model outputs as structured proposals (not raw text), and logs accept/reject for training and audit.


Definition

HITL patterns include approve-to-send, exception queues, calibration sampling, and escalation tiers when automation confidence is low or stakes are high.

Why it matters

Poor HITL design creates bottlenecks: everything goes to review, or risky actions auto-fire. Neither scales.

Core framework

Classify decisions

Irreversible vs reversible; regulated vs non-regulated.

Minimum review packet

Show diffs, source excerpts, and recommended action—hide noise.


Detailed breakdown

Feedback loops

Capture accept/reject reasons to retrain classifiers and adjust thresholds.

Technical patterns

Review task contract

JSON
{ "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://primeaxiom.ai/schemas/review-task.json", "title": "ReviewTask", "type": "object", "required": ["case_id", "proposed_action", "confidence", "policy_version"], "properties": { "case_id": { "type": "string", "format": "uuid", "description": "Stable idempotency key for this review unit" }, "proposed_action": { "type": "object", "required": ["type", "payload"], "properties": { "type": { "type": "string", "enum": ["crm_update", "deny", "escalate"] }, "payload": { "type": "object", "additionalProperties": true } } }, "evidence": { "type": "array", "items": { "type": "object", "required": ["ref", "kind"], "properties": { "ref": { "type": "string", "description": "chunk_id | doc_uri | message_id" }, "kind": { "type": "string", "enum": ["retrieval", "ocr", "tool_output", "user_message"] }, "score": { "type": "number", "minimum": 0, "maximum": 1 } } } }, "confidence": { "type": "number", "minimum": 0, "maximum": 1 }, "policy_version": { "type": "string", "pattern": "^[0-9]+\.[0-9]+\.[0-9]+$" }, "sla_deadline_at": { "type": "string", "format": "date-time" } }, "additionalProperties": false }

Confidence gate

  • Scores below threshold must not trigger mutating tools; enqueue a ReviewTask instead.

Routes automation vs human review from model confidence. Extend with cohort-specific thresholds or model-calibrated scores.

Python
def route_by_confidence(extraction: dict, threshold: float = 0.85) -> dict: """Return path for workflow engine — no CRM writes when path is human_review.""" score = float(extraction.get("confidence", 0.0)) if score >= threshold: return {"path": "auto_apply", "confidence": score} return { "path": "human_review", "confidence": score, "reason": "below_threshold", } # Example: branch before side effects extraction = {"confidence": 0.62} result = route_by_confidence(extraction, threshold=0.85) if result["path"] == "human_review": enqueue_review_task(build_review_task_from_extraction(extraction)) else: apply_to_crm(extraction)

Code examples

Confidence score → manual review queue

Single threshold check before any CRM or tool side effects; tune `CONFIDENCE_THRESHOLD` per tenant or model version.

Python
MANUAL_REVIEW_QUEUE = "hitl_default" CONFIDENCE_THRESHOLD = 0.82 def route_by_confidence_score(confidence_score: float) -> dict: """Below threshold → manual review queue; no mutating tools run.""" if confidence_score < CONFIDENCE_THRESHOLD: return { "path": "manual_review", "queue": MANUAL_REVIEW_QUEUE, "confidence_score": confidence_score, "reason": "below_threshold", } return { "path": "automated", "confidence_score": confidence_score, } # Usage before CRM write decision = route_by_confidence_score(model_output["confidence"]) if decision["path"] == "manual_review": enqueue(MANUAL_REVIEW_QUEUE, payload=model_output) else: crm.apply(model_output)

Proposal payload

UI renders structured fields; reduces mis-clicks.

TypeScript
export function buildReviewTask(extraction) { return { fields: extraction.fields, confidence: extraction.confidence, rawDocRef: extraction.storageUrl, }; }

System architecture

YAML
[Inbound: model output + evidence bundle] | v +------------------+ | Confidence gate | +------------------+ | +----+----+ | | auto_ok needs_human | | v v [Apply] [ReviewTask queue + SLA timer] | v [Human UI: approve | edit | reject] | v [Immutable outcome row + audit] | v [Feedback / training dataset]

Real-world example

A lender used HITL only for income extraction below confidence threshold—keeping throughput while controlling risk.

Common mistakes

  • Review queues without SLAs—backlogs become dumpsters.
  • Humans retyping what automation should structurally fix.

PrimeAxiom designs HITL that matches your risk appetite—book a workflow review.