Guides

Set Up Alerts

How to configure alert rules — consecutive failures, multi-location correlation, SSL alerts, and SLO burn-rate alerts.

Set Up Alerts

Alerts notify you when monitors detect problems. Each alert rule combines one or more conditions (what triggers the alert) with one or more channels (where the notification goes).

Define alert channels

To send notifications, first define your channels in the alertChannels block at the top of yorker.config.yaml. Each channel has a name (the key) and a type-specific configuration.

alertChannels:
  ops-slack:
    type: slack
    webhookUrl: "{{secrets.SLACK_WEBHOOK_URL}}"

  on-call-email:
    type: email
    addresses:
      - [email protected]
      - [email protected]

  pagerduty:
    type: webhook
    url: "{{secrets.PAGERDUTY_WEBHOOK_URL}}"
    method: POST
    headers:
      Authorization: "Token token={{secrets.PD_TOKEN}}"

Channel types

TypeRequired fieldsDescription
slackwebhookUrlPosts to a Slack incoming webhook.
emailaddresses (array, at least one)Sends email to the listed addresses.
webhookurlSends an HTTP request. method defaults to POST. Optional headers for auth.

Reference channels in alerts

To attach a channel to an alert, reference it with the @channel-name syntax:

monitors:
  - name: API Health
    type: http
    url: https://api.example.com/health
    alerts:
      - conditions:
          - type: consecutive_failures
            count: 3
        channels:
          - "@ops-slack"
          - "@on-call-email"

Alert conditions

Each alert must have at least one condition. Multiple conditions on the same alert are combined with AND logic — all conditions must be met for the alert to trigger.

consecutive_failures

Triggers after a monitor fails a specified number of times in a row.

- type: consecutive_failures
  count: 3          # default: 2, min: 1

response_time_threshold

Triggers when response time exceeds a threshold.

- type: response_time_threshold
  maxMs: 5000       # milliseconds

multi_location_failure

Triggers when a monitor fails from multiple locations within a time window. This reduces false positives from localized network issues.

- type: multi_location_failure
  minLocations: 2   # default: 2, min: 2
  windowSeconds: 300 # default: 300 (5 minutes)

ssl_expiry

Triggers when an SSL certificate is approaching expiration.

- type: ssl_expiry
  daysBeforeExpiry: 14  # default: 14, min: 1
  severity: warning     # optional: critical | warning | info

ssl_certificate_changed

Triggers when the leaf certificate's fingerprint changes between runs — useful for catching unexpected cert rotations and possible man-in-the-middle conditions.

- type: ssl_certificate_changed
  severity: critical

ssl_self_signed

Triggers when the endpoint presents a self-signed (or otherwise untrusted-root) certificate.

- type: ssl_self_signed
  severity: critical

ssl_protocol_deprecated

Triggers when the TLS handshake negotiates a protocol older than minProtocol.

- type: ssl_protocol_deprecated
  minProtocol: TLSv1.2   # default: TLSv1.2 (allowed: TLSv1.2, TLSv1.3)
  severity: warning

burn_rate

Triggers when an SLO's error budget is burning faster than a threshold across a short window AND a long window (the Google SRE multi-window burn-rate alerting pattern). Requires an existing SLO — reference it by ID.

- type: burn_rate
  sloId: slo_abc123
  burnRateThreshold: 14.4   # burn rate multiple (e.g. 14.4 = budget exhausted in ~2 days at a 30d SLO)
  longWindowMinutes: 60     # minimum 60
  shortWindowMinutes: 5     # minimum 5, MUST be less than longWindowMinutes

Burn-rate alerts are automatically wired up when you set burnRateAlerts: true on an SLO (the default). Use a manual burn_rate condition only if you need custom threshold/window combinations beyond the built-in ones. See Define SLOs for the simpler path.

baseline_anomaly

Triggers when a performance metric drifts away from its learned baseline for several consecutive runs. Baselines are stored per (check, location, hour-of-day, day-of-week) bucket so a monitor that's slower on Monday mornings doesn't trip the alert every Monday.

- type: baseline_anomaly
  metric: response_time       # required
  sigmaThreshold: 3           # default: 3 (min: 2, max: 10)
  consecutiveCount: 3         # default: 3 (min: 2, max: 20, integer)
  direction: above            # default: above (allowed: above | below | both)
  severity: warning           # default: warning

Supported metrics. HTTP: response_time, dns_lookup, tls_handshake, ttfb, content_transfer. Browser: lcp, fcp, cls.

How the chain works. On each result ingestion the engine reads the last N runs for this check+location, regardless of status. The alert fires only if all N are successful AND each deviates by more than sigmaThreshold·σ from its own time-bucketed baseline in the configured direction. Any non-success run inside the window breaks the chain, so this alert stays scoped to drift-style regressions rather than outages. Failures are not skipped over to reach earlier successes: the window simply slides forward until it again contains N successes.

Pick a reasonable threshold. 3σ / 3 consecutive is a conservative starting point: under the normal assumption (and assuming run-to-run independence), the per-run false-positive rate at 3σ is ≈1-in-740 for one-sided checks (direction: above or below, the default) and ≈1-in-370 for two-sided (direction: both). Across 3 consecutive runs that compounds to ≈1-in-400-million one-sided or ≈1-in-50-million two-sided. In practice runs sharing a time bucket carry correlated noise (network conditions, regional perturbations), so treat the compounded figure as a theoretical ceiling. Tightening to 4σ / 5 consecutive buys near-zero false positives; loosening to 2σ / 2 consecutive is effectively a point-anomaly detector.

Direction. above catches slowdowns (the common case for response-time metrics). below catches suspiciously-fast responses, which often indicate the runner short-circuiting past the real work (stale cache hits, 304 storms, redirect chains being skipped). both is useful for CLS-style vitals where either side is a UX regression.

Severity

All SSL-related conditions (including ssl_expiry), mcp_schema_drift, and baseline_anomaly accept an optional severity field with value critical, warning, or info. Severity is stored on the resulting alert instance and surfaces in the alerts dashboard: use it to distinguish "nice to know" rotations from genuine outages. mcp_schema_drift and baseline_anomaly default to warning (set by the shared schema); SSL conditions have no schema default and fall back to critical via the evaluator.

Cascading alerts

Alerts follow the same cascade as other monitor settings: defaults -> group -> monitor. Define alerts at any level:

defaults:
  alerts:
    - conditions:
        - type: consecutive_failures
          count: 2
      channels:
        - "@ops-slack"

groups:
  - name: Critical APIs
    alerts:
      - conditions:
          - type: consecutive_failures
            count: 1
        channels:
          - "@ops-slack"
          - "@pagerduty"
    monitors:
      - name: Payments API
        type: http
        url: https://api.example.com/payments

When a monitor defines its own alerts, those replace the inherited alerts entirely. To clear inherited alerts, set alerts: [] on the monitor.

Multi-tier alerting

To escalate alerts based on severity, define multiple alert rules with different conditions and channels:

monitors:
  - name: Checkout Flow
    type: browser
    script: ./monitors/checkout.ts
    alerts:
      # Tier 1: Slack for initial failures
      - name: checkout-warning
        conditions:
          - type: consecutive_failures
            count: 2
        channels:
          - "@ops-slack"

      # Tier 2: PagerDuty for persistent multi-location failures
      - name: checkout-critical
        conditions:
          - type: consecutive_failures
            count: 5
          - type: multi_location_failure
            minLocations: 3
        channels:
          - "@pagerduty"
          - "@on-call-email"

      # SSL expiry: early warning
      - name: checkout-ssl
        conditions:
          - type: ssl_expiry
            daysBeforeExpiry: 30
            severity: warning
        channels:
          - "@ops-slack"

      # SSL rotation detection
      - name: checkout-ssl-rotation
        conditions:
          - type: ssl_certificate_changed
            severity: info
        channels:
          - "@ops-slack"

OTel trace linking

When an alert fires, Yorker includes the OpenTelemetry trace ID in the notification payload. If your application propagates the W3C traceparent header, you can jump directly from an alert to the distributed trace in your observability backend (e.g., HyperDX, Jaeger, Grafana Tempo) to identify root cause.

Web UI

To create alerts through the dashboard:

  1. Navigate to a monitor's detail page.
  2. Click Add Alert Rule.
  3. Select one or more conditions and configure thresholds.
  4. Choose notification channels (create them in Settings > Notification Channels if needed).
  5. Click Save.

Alert rules created in the Web UI and the CLI are the same underlying resource. The CLI's yorker deploy command will detect and diff against rules created through the UI, and abort on drift unless you pass --force or --accept-remote.

You can also view all alerts across monitors from the Alerts page in the dashboard.

CLI alert management

In addition to defining alerts in yorker.config.yaml, you can manage alert instances directly from the command line.

List active alerts

yorker alerts list

Include resolved and recovered alerts with --all, or filter by monitor:

yorker alerts list --monitor "Homepage" --all

Acknowledge and resolve

yorker alerts ack ainst_abc123
yorker alerts resolve ainst_abc123

View alert history

yorker alerts history --since 7d

Create alert rules imperatively

yorker alerts rules create \
  --monitor "Homepage" \
  --condition "consecutive_failures >= 3" \
  --channel nch_abc123 \
  --name "homepage-down"

Baseline-deviation rules use baseline_anomaly:<metric> (defaults to 3σ, 3 consecutive, above) or the explicit baseline_anomaly:<metric>@<sigma>σ:<consecutive>[:above|below|both] form:

yorker alerts rules create \
  --monitor "Checkout API" \
  --condition "baseline_anomaly:response_time" \
  --channel nch_abc123 \
  --severity warning

yorker alerts rules create \
  --monitor "Marketing site" \
  --condition "baseline_anomaly:lcp@4σ:5:above" \
  --channel nch_pagerduty \
  --severity critical

See the CLI reference for the full list of alert commands and condition formats.