Observability: Logs, Traces, and “Why Did Automation Do That?”

Overview

Business automation needs operator-grade observability—not only DevOps metrics. This guide ties user-visible outcomes to internal traces.

Quick definition

Automation observability ties `trace_id` across HTTP ingress, queue workers, and external APIs with structured logs (`level`, `service`, `correlation_id`) and RED/USE metrics per workflow.


Definition

Observability for automation means correlating customer-visible actions (emails sent, records updated) with internal steps (rules fired, model outputs, tool calls).

Why it matters

Without traces, teams debate blame instead of fixing root causes—especially with AI steps.

Core framework

Step-by-step model as TypeScript interfaces (machine-readable checkpoints).

Correlation IDs

TypeScript
/** * Correlation IDs * Propagate across webhooks, workers, and CRM updates. */ export interface CoreFrameworkStep1CorrelationIDs { /** Order in the core framework (0-based) */ readonly stepIndex: 0; /** Display title for this step */ readonly title: "Correlation IDs"; /** Narrative checkpoints as published in the guide */ readonly narrative: readonly string[]; } export const CoreFrameworkStep1CorrelationIDs_NARRATIVE: readonly string[] = [ "Propagate across webhooks, workers, and CRM updates." ] as const;

Business events in logs

TypeScript
/** * Business events in logs * Log domain language—not only stack traces. */ export interface CoreFrameworkStep2BusinessEventsInLogs { /** Order in the core framework (0-based) */ readonly stepIndex: 1; /** Display title for this step */ readonly title: "Business events in logs"; /** Narrative checkpoints as published in the guide */ readonly narrative: readonly string[]; } export const CoreFrameworkStep2BusinessEventsInLogs_NARRATIVE: readonly string[] = [ "Log domain language—not only stack traces." ] as const;

Detailed breakdown

Logic sections encoded as Python functions with structured narrative payloads.

Sampling for AI

Python
def logic_block_1_sampling_for_ai(context: dict) -> dict: """Operational logic: Sampling for AI""" # Narrative steps from the guide (logic section) paragraphs = ["Store prompts/outputs per policy; redact PII automatically."] return { "heading": "Sampling for AI", "paragraphs": paragraphs, "context_keys": tuple(sorted(context.keys())), }

Technical patterns

Trace propagation

  • Incoming request sets or continues `traceparent`; pass to job payload.
  • Child spans for each external API call.

Code examples

Structured log helper

Consistent JSON for log aggregation.

TypeScript
export function logWorkflow(ctx, level, msg, extra = {}) { console.log(JSON.stringify({ level, msg, ...ctx, ...extra, ts: Date.now() })); }

System architecture

YAML
[Services + workers] [OTel collector] [Trace + log backend] [Dashboards: p95 latency, error rate by workflow] [Alerts on SLO burn]

Real-world example

A support org cut MTTR by half using trace views showing which rule branch fired before a wrong escalation.

Common mistakes

  • Logging only errors—misses silent wrong decisions.
  • No retention policy for debug data—privacy risk.

PrimeAxiom implements traceable workflows—book an observability design session.