Human-in-the-Loop Design Patterns for Automation and AI
Overview
Human-in-the-loop (HITL) is not “we have a person somewhere.” It is explicit design: which decisions require humans, what information they see, SLA for review, and how feedback improves models and rules.
Quick definition
Human-in-the-loop automation defines explicit review queues with SLA, presents model outputs as structured proposals (not raw text), and logs accept/reject for training and audit.
Definition
HITL patterns include approve-to-send, exception queues, calibration sampling, and escalation tiers when automation confidence is low or stakes are high.
Why it matters
Poor HITL design creates bottlenecks: everything goes to review, or risky actions auto-fire. Neither scales.
Core framework
Step-by-step model as TypeScript interfaces (machine-readable checkpoints).
Classify decisions
/**
* Classify decisions
* Irreversible vs reversible; regulated vs non-regulated.
*/
export interface CoreFrameworkStep1ClassifyDecisions {
/** Order in the core framework (0-based) */
readonly stepIndex: 0;
/** Display title for this step */
readonly title: "Classify decisions";
/** Narrative checkpoints as published in the guide */
readonly narrative: readonly string[];
}
export const CoreFrameworkStep1ClassifyDecisions_NARRATIVE: readonly string[] = [
"Irreversible vs reversible; regulated vs non-regulated."
] as const;Minimum review packet
/**
* Minimum review packet
* Show diffs, source excerpts, and recommended action—hide noise.
*/
export interface CoreFrameworkStep2MinimumReviewPacket {
/** Order in the core framework (0-based) */
readonly stepIndex: 1;
/** Display title for this step */
readonly title: "Minimum review packet";
/** Narrative checkpoints as published in the guide */
readonly narrative: readonly string[];
}
export const CoreFrameworkStep2MinimumReviewPacket_NARRATIVE: readonly string[] = [
"Show diffs, source excerpts, and recommended action—hide noise."
] as const;Detailed breakdown
Logic sections encoded as Python functions with structured narrative payloads.
Feedback loops
def logic_block_1_feedback_loops(context: dict) -> dict:
"""Operational logic: Feedback loops"""
# Narrative steps from the guide (logic section)
paragraphs = ["Capture accept/reject reasons to retrain classifiers and adjust thresholds."]
return {
"heading": "Feedback loops",
"paragraphs": paragraphs,
"context_keys": tuple(sorted(context.keys())),
}Technical patterns
Review task contract
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://primeaxiom.ai/schemas/review-task.json",
"title": "ReviewTask",
"type": "object",
"required": ["case_id", "proposed_action", "confidence", "policy_version"],
"properties": {
"case_id": {
"type": "string",
"format": "uuid",
"description": "Stable idempotency key for this review unit"
},
"proposed_action": {
"type": "object",
"required": ["type", "payload"],
"properties": {
"type": { "type": "string", "enum": ["crm_update", "deny", "escalate"] },
"payload": { "type": "object", "additionalProperties": true }
}
},
"evidence": {
"type": "array",
"items": {
"type": "object",
"required": ["ref", "kind"],
"properties": {
"ref": { "type": "string", "description": "chunk_id | doc_uri | message_id" },
"kind": { "type": "string", "enum": ["retrieval", "ocr", "tool_output", "user_message"] },
"score": { "type": "number", "minimum": 0, "maximum": 1 }
}
}
},
"confidence": { "type": "number", "minimum": 0, "maximum": 1 },
"policy_version": { "type": "string", "pattern": "^[0-9]+\.[0-9]+\.[0-9]+$" },
"sla_deadline_at": { "type": "string", "format": "date-time" }
},
"additionalProperties": false
}Confidence gate
- Scores below threshold must not trigger mutating tools; enqueue a ReviewTask instead.
Routes automation vs human review from model confidence. Extend with cohort-specific thresholds or model-calibrated scores.
def route_by_confidence(extraction: dict, threshold: float = 0.85) -> dict:
"""Return path for workflow engine — no CRM writes when path is human_review."""
score = float(extraction.get("confidence", 0.0))
if score >= threshold:
return {"path": "auto_apply", "confidence": score}
return {
"path": "human_review",
"confidence": score,
"reason": "below_threshold",
}
# Example: branch before side effects
extraction = {"confidence": 0.62}
result = route_by_confidence(extraction, threshold=0.85)
if result["path"] == "human_review":
enqueue_review_task(build_review_task_from_extraction(extraction))
else:
apply_to_crm(extraction)Code examples
Confidence score → manual review queue
Single threshold check before any CRM or tool side effects; tune `CONFIDENCE_THRESHOLD` per tenant or model version.
MANUAL_REVIEW_QUEUE = "hitl_default"
CONFIDENCE_THRESHOLD = 0.82
def route_by_confidence_score(confidence_score: float) -> dict:
"""Below threshold → manual review queue; no mutating tools run."""
if confidence_score < CONFIDENCE_THRESHOLD:
return {
"path": "manual_review",
"queue": MANUAL_REVIEW_QUEUE,
"confidence_score": confidence_score,
"reason": "below_threshold",
}
return {
"path": "automated",
"confidence_score": confidence_score,
}
# Usage before CRM write
decision = route_by_confidence_score(model_output["confidence"])
if decision["path"] == "manual_review":
enqueue(MANUAL_REVIEW_QUEUE, payload=model_output)
else:
crm.apply(model_output)Proposal payload
UI renders structured fields; reduces mis-clicks.
export function buildReviewTask(extraction) {
return {
fields: extraction.fields,
confidence: extraction.confidence,
rawDocRef: extraction.storageUrl,
};
}System architecture
[Inbound: model output + evidence bundle]
|
v
+------------------+
| Confidence gate |
+------------------+
|
+----+----+
| |
auto_ok needs_human
| |
v v
[Apply] [ReviewTask queue + SLA timer]
|
v
[Human UI: approve | edit | reject]
|
v
[Immutable outcome row + audit]
|
v
[Feedback / training dataset]Real-world example
A lender used HITL only for income extraction below confidence threshold—keeping throughput while controlling risk.
Common mistakes
- Review queues without SLAs—backlogs become dumpsters.
- Humans retyping what automation should structurally fix.
Related topics
PrimeAxiom designs HITL that matches your risk appetite—book a workflow review.