SLA Management and Escalation Automation
Overview
SLAs are promises. Automation should measure breach risk before breach, escalate with context, and log reasons for retrospectives.
Quick definition
SLA automation stores pause/resume semantics (waiting on customer), computes breach deadlines in UTC, and escalates via time-based jobs—not cron polling entire tables naively.
Definition
SLA automation combines business calendars, pause rules, priority matrices, and escalation paths across channels (email, SMS, Slack).
Why it matters
Reactive firefighting burns teams; proactive escalation preserves customers and data for improvement.
Core framework
Step-by-step model as TypeScript interfaces (machine-readable checkpoints).
Define clocks
/**
* Define clocks
* Business hours vs calendar hours; pause when waiting on customer.
*/
export interface CoreFrameworkStep1DefineClocks {
/** Order in the core framework (0-based) */
readonly stepIndex: 0;
/** Display title for this step */
readonly title: "Define clocks";
/** Narrative checkpoints as published in the guide */
readonly narrative: readonly string[];
}
export const CoreFrameworkStep1DefineClocks_NARRATIVE: readonly string[] = [
"Business hours vs calendar hours; pause when waiting on customer."
] as const;Tiered escalation
/**
* Tiered escalation
* Owner → manager → executive with different thresholds by tier.
*/
export interface CoreFrameworkStep2TieredEscalation {
/** Order in the core framework (0-based) */
readonly stepIndex: 1;
/** Display title for this step */
readonly title: "Tiered escalation";
/** Narrative checkpoints as published in the guide */
readonly narrative: readonly string[];
}
export const CoreFrameworkStep2TieredEscalation_NARRATIVE: readonly string[] = [
"Owner → manager → executive with different thresholds by tier."
] as const;Detailed breakdown
Logic sections encoded as Python functions with structured narrative payloads.
Noise control
def logic_block_1_noise_control(context: dict) -> dict:
"""Operational logic: Noise control"""
# Narrative steps from the guide (logic section)
paragraphs = ["Aggregate related alerts; suppress duplicates; require acknowledgment."]
return {
"heading": "Noise control",
"paragraphs": paragraphs,
"context_keys": tuple(sorted(context.keys())),
}Technical patterns
Pause buckets
- When status = `pending_customer`, stop SLA clock; persist `paused_at`.
- Resume on inbound message clears pause.
Code examples
Schedule escalation job
Delayed queue job at deadline minus warning window.
export function scheduleEscalation(ticketId, dueAt) {
const warnAt = new Date(dueAt - 15 * 60 * 1000);
return queue.add('sla-warn', { ticketId }, { delay: warnAt - Date.now() });
}System architecture
[Ticket state changes]
→ [SLA policy lookup]
→ [Deadline calculator]
→ [Scheduler: warn → breach → escalate]
→ [Pager / manager queue]Real-world example
A B2B support org cut SLA misses 30% by warning at 70% of time budget—before customers churned.
Common mistakes
- Too many alerts—teams mute everything.
- SLAs defined but not instrumented in systems—impossible to audit.
Related topics
PrimeAxiom implements SLA engines with CRM tasks and messaging—book a playbook review.