Observability: Logs, Traces, and “Why Did Automation Do That?”
Overview
Business automation needs operator-grade observability—not only DevOps metrics. This guide ties user-visible outcomes to internal traces.
Quick definition
Automation observability ties `trace_id` across HTTP ingress, queue workers, and external APIs with structured logs (`level`, `service`, `correlation_id`) and RED/USE metrics per workflow.
Definition
Observability for automation means correlating customer-visible actions (emails sent, records updated) with internal steps (rules fired, model outputs, tool calls).
Why it matters
Without traces, teams debate blame instead of fixing root causes—especially with AI steps.
Core framework
Step-by-step model as TypeScript interfaces (machine-readable checkpoints).
Correlation IDs
/**
* Correlation IDs
* Propagate across webhooks, workers, and CRM updates.
*/
export interface CoreFrameworkStep1CorrelationIDs {
/** Order in the core framework (0-based) */
readonly stepIndex: 0;
/** Display title for this step */
readonly title: "Correlation IDs";
/** Narrative checkpoints as published in the guide */
readonly narrative: readonly string[];
}
export const CoreFrameworkStep1CorrelationIDs_NARRATIVE: readonly string[] = [
"Propagate across webhooks, workers, and CRM updates."
] as const;Business events in logs
/**
* Business events in logs
* Log domain language—not only stack traces.
*/
export interface CoreFrameworkStep2BusinessEventsInLogs {
/** Order in the core framework (0-based) */
readonly stepIndex: 1;
/** Display title for this step */
readonly title: "Business events in logs";
/** Narrative checkpoints as published in the guide */
readonly narrative: readonly string[];
}
export const CoreFrameworkStep2BusinessEventsInLogs_NARRATIVE: readonly string[] = [
"Log domain language—not only stack traces."
] as const;Detailed breakdown
Logic sections encoded as Python functions with structured narrative payloads.
Sampling for AI
def logic_block_1_sampling_for_ai(context: dict) -> dict:
"""Operational logic: Sampling for AI"""
# Narrative steps from the guide (logic section)
paragraphs = ["Store prompts/outputs per policy; redact PII automatically."]
return {
"heading": "Sampling for AI",
"paragraphs": paragraphs,
"context_keys": tuple(sorted(context.keys())),
}Technical patterns
Trace propagation
- Incoming request sets or continues `traceparent`; pass to job payload.
- Child spans for each external API call.
Code examples
Structured log helper
Consistent JSON for log aggregation.
export function logWorkflow(ctx, level, msg, extra = {}) {
console.log(JSON.stringify({ level, msg, ...ctx, ...extra, ts: Date.now() }));
}System architecture
[Services + workers]
→ [OTel collector]
→ [Trace + log backend]
→ [Dashboards: p95 latency, error rate by workflow]
→ [Alerts on SLO burn]Real-world example
A support org cut MTTR by half using trace views showing which rule branch fired before a wrong escalation.
Common mistakes
- Logging only errors—misses silent wrong decisions.
- No retention policy for debug data—privacy risk.
Related topics
PrimeAxiom implements traceable workflows—book an observability design session.