SLA Management and Escalation Automation

Overview

SLAs are promises. Automation should measure breach risk before breach, escalate with context, and log reasons for retrospectives.

Quick definition

SLA automation stores pause/resume semantics (waiting on customer), computes breach deadlines in UTC, and escalates via time-based jobs—not cron polling entire tables naively.


Definition

SLA automation combines business calendars, pause rules, priority matrices, and escalation paths across channels (email, SMS, Slack).

Why it matters

Reactive firefighting burns teams; proactive escalation preserves customers and data for improvement.

Core framework

Step-by-step model as TypeScript interfaces (machine-readable checkpoints).

Define clocks

TypeScript
/** * Define clocks * Business hours vs calendar hours; pause when waiting on customer. */ export interface CoreFrameworkStep1DefineClocks { /** Order in the core framework (0-based) */ readonly stepIndex: 0; /** Display title for this step */ readonly title: "Define clocks"; /** Narrative checkpoints as published in the guide */ readonly narrative: readonly string[]; } export const CoreFrameworkStep1DefineClocks_NARRATIVE: readonly string[] = [ "Business hours vs calendar hours; pause when waiting on customer." ] as const;

Tiered escalation

TypeScript
/** * Tiered escalation * Owner → manager → executive with different thresholds by tier. */ export interface CoreFrameworkStep2TieredEscalation { /** Order in the core framework (0-based) */ readonly stepIndex: 1; /** Display title for this step */ readonly title: "Tiered escalation"; /** Narrative checkpoints as published in the guide */ readonly narrative: readonly string[]; } export const CoreFrameworkStep2TieredEscalation_NARRATIVE: readonly string[] = [ "Owner → manager → executive with different thresholds by tier." ] as const;

Detailed breakdown

Logic sections encoded as Python functions with structured narrative payloads.

Noise control

Python
def logic_block_1_noise_control(context: dict) -> dict: """Operational logic: Noise control""" # Narrative steps from the guide (logic section) paragraphs = ["Aggregate related alerts; suppress duplicates; require acknowledgment."] return { "heading": "Noise control", "paragraphs": paragraphs, "context_keys": tuple(sorted(context.keys())), }

Technical patterns

Pause buckets

  • When status = `pending_customer`, stop SLA clock; persist `paused_at`.
  • Resume on inbound message clears pause.

Code examples

Schedule escalation job

Delayed queue job at deadline minus warning window.

TypeScript
export function scheduleEscalation(ticketId, dueAt) { const warnAt = new Date(dueAt - 15 * 60 * 1000); return queue.add('sla-warn', { ticketId }, { delay: warnAt - Date.now() }); }

System architecture

YAML
[Ticket state changes] [SLA policy lookup] [Deadline calculator] [Scheduler: warn → breach → escalate] [Pager / manager queue]

Real-world example

A B2B support org cut SLA misses 30% by warning at 70% of time budget—before customers churned.

Common mistakes

  • Too many alerts—teams mute everything.
  • SLAs defined but not instrumented in systems—impossible to audit.

PrimeAxiom implements SLA engines with CRM tasks and messaging—book a playbook review.