Human-in-the-Loop Design Patterns for Automation and AI

Overview

Human-in-the-loop (HITL) is not “we have a person somewhere.” It is explicit design: which decisions require humans, what information they see, SLA for review, and how feedback improves models and rules.

Quick definition

Human-in-the-loop automation defines explicit review queues with SLA, presents model outputs as structured proposals (not raw text), and logs accept/reject for training and audit.


Definition

HITL patterns include approve-to-send, exception queues, calibration sampling, and escalation tiers when automation confidence is low or stakes are high.

Why it matters

Poor HITL design creates bottlenecks: everything goes to review, or risky actions auto-fire. Neither scales.

Core framework

Step-by-step model as TypeScript interfaces (machine-readable checkpoints).

Classify decisions

TypeScript
/** * Classify decisions * Irreversible vs reversible; regulated vs non-regulated. */ export interface CoreFrameworkStep1ClassifyDecisions { /** Order in the core framework (0-based) */ readonly stepIndex: 0; /** Display title for this step */ readonly title: "Classify decisions"; /** Narrative checkpoints as published in the guide */ readonly narrative: readonly string[]; } export const CoreFrameworkStep1ClassifyDecisions_NARRATIVE: readonly string[] = [ "Irreversible vs reversible; regulated vs non-regulated." ] as const;

Minimum review packet

TypeScript
/** * Minimum review packet * Show diffs, source excerpts, and recommended action—hide noise. */ export interface CoreFrameworkStep2MinimumReviewPacket { /** Order in the core framework (0-based) */ readonly stepIndex: 1; /** Display title for this step */ readonly title: "Minimum review packet"; /** Narrative checkpoints as published in the guide */ readonly narrative: readonly string[]; } export const CoreFrameworkStep2MinimumReviewPacket_NARRATIVE: readonly string[] = [ "Show diffs, source excerpts, and recommended action—hide noise." ] as const;

Detailed breakdown

Logic sections encoded as Python functions with structured narrative payloads.

Feedback loops

Python
def logic_block_1_feedback_loops(context: dict) -> dict: """Operational logic: Feedback loops""" # Narrative steps from the guide (logic section) paragraphs = ["Capture accept/reject reasons to retrain classifiers and adjust thresholds."] return { "heading": "Feedback loops", "paragraphs": paragraphs, "context_keys": tuple(sorted(context.keys())), }

Technical patterns

Review task contract

JSON
{ "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://primeaxiom.ai/schemas/review-task.json", "title": "ReviewTask", "type": "object", "required": ["case_id", "proposed_action", "confidence", "policy_version"], "properties": { "case_id": { "type": "string", "format": "uuid", "description": "Stable idempotency key for this review unit" }, "proposed_action": { "type": "object", "required": ["type", "payload"], "properties": { "type": { "type": "string", "enum": ["crm_update", "deny", "escalate"] }, "payload": { "type": "object", "additionalProperties": true } } }, "evidence": { "type": "array", "items": { "type": "object", "required": ["ref", "kind"], "properties": { "ref": { "type": "string", "description": "chunk_id | doc_uri | message_id" }, "kind": { "type": "string", "enum": ["retrieval", "ocr", "tool_output", "user_message"] }, "score": { "type": "number", "minimum": 0, "maximum": 1 } } } }, "confidence": { "type": "number", "minimum": 0, "maximum": 1 }, "policy_version": { "type": "string", "pattern": "^[0-9]+\.[0-9]+\.[0-9]+$" }, "sla_deadline_at": { "type": "string", "format": "date-time" } }, "additionalProperties": false }

Confidence gate

  • Scores below threshold must not trigger mutating tools; enqueue a ReviewTask instead.

Routes automation vs human review from model confidence. Extend with cohort-specific thresholds or model-calibrated scores.

Python
def route_by_confidence(extraction: dict, threshold: float = 0.85) -> dict: """Return path for workflow engine — no CRM writes when path is human_review.""" score = float(extraction.get("confidence", 0.0)) if score >= threshold: return {"path": "auto_apply", "confidence": score} return { "path": "human_review", "confidence": score, "reason": "below_threshold", } # Example: branch before side effects extraction = {"confidence": 0.62} result = route_by_confidence(extraction, threshold=0.85) if result["path"] == "human_review": enqueue_review_task(build_review_task_from_extraction(extraction)) else: apply_to_crm(extraction)

Code examples

Confidence score → manual review queue

Single threshold check before any CRM or tool side effects; tune `CONFIDENCE_THRESHOLD` per tenant or model version.

Python
MANUAL_REVIEW_QUEUE = "hitl_default" CONFIDENCE_THRESHOLD = 0.82 def route_by_confidence_score(confidence_score: float) -> dict: """Below threshold → manual review queue; no mutating tools run.""" if confidence_score < CONFIDENCE_THRESHOLD: return { "path": "manual_review", "queue": MANUAL_REVIEW_QUEUE, "confidence_score": confidence_score, "reason": "below_threshold", } return { "path": "automated", "confidence_score": confidence_score, } # Usage before CRM write decision = route_by_confidence_score(model_output["confidence"]) if decision["path"] == "manual_review": enqueue(MANUAL_REVIEW_QUEUE, payload=model_output) else: crm.apply(model_output)

Proposal payload

UI renders structured fields; reduces mis-clicks.

TypeScript
export function buildReviewTask(extraction) { return { fields: extraction.fields, confidence: extraction.confidence, rawDocRef: extraction.storageUrl, }; }

System architecture

YAML
[Inbound: model output + evidence bundle] | v +------------------+ | Confidence gate | +------------------+ | +----+----+ | | auto_ok needs_human | | v v [Apply] [ReviewTask queue + SLA timer] | v [Human UI: approve | edit | reject] | v [Immutable outcome row + audit] | v [Feedback / training dataset]

Real-world example

A lender used HITL only for income extraction below confidence threshold—keeping throughput while controlling risk.

Common mistakes

  • Review queues without SLAs—backlogs become dumpsters.
  • Humans retyping what automation should structurally fix.

PrimeAxiom designs HITL that matches your risk appetite—book a workflow review.