---
title: 'Set Up Alerts'
description: 'How to configure alert rules — consecutive failures, multi-location correlation, SSL alerts, and SLO burn-rate alerts.'
section: 'Guides'
canonical_url: 'https://yorkermonitoring.com/docs/guides/set-up-alerts'
---

# Set Up Alerts

Alerts notify you when monitors detect problems. Each alert rule combines one or more **conditions** (what triggers the alert) with one or more **channels** (where the notification goes).

## Define alert channels

To send notifications, first define your channels in the `alertChannels` block at the top of `yorker.config.yaml`. Each channel has a name (the key) and a type-specific configuration.

```yaml
alertChannels:
  ops-slack:
    type: slack
    webhookUrl: "{{secrets.SLACK_WEBHOOK_URL}}"

  on-call-email:
    type: email
    addresses:
      - oncall@example.com
      - sre-team@example.com

  pagerduty-oncall:
    type: pagerduty
    routingKey: "{{secrets.PAGERDUTY_ROUTING_KEY}}"

  servicenow-prod:
    type: servicenow
    instanceUrl: "https://acme.service-now.com"
    username: "{{secrets.SNOW_USER}}"
    password: "{{secrets.SNOW_PASSWORD}}"

  generic-webhook:
    type: webhook
    url: "{{secrets.OPSGENIE_WEBHOOK_URL}}"
    method: POST
    headers:
      Authorization: "GenieKey {{secrets.OPSGENIE_KEY}}"
```

### Channel types

| Type | Required fields | Description |
|------|----------------|-------------|
| `slack` | `webhookUrl` | Posts to a Slack incoming webhook. |
| `email` | `addresses` (array, at least one) | Sends email to the listed addresses. |
| `pagerduty` | `routingKey` | PagerDuty Events API v2. Optional `serviceRegion` (`us` or `eu`). |
| `servicenow` | `instanceUrl`, `username`, `password` | ServiceNow Table API. Optional `assignmentGroup`. |
| `webhook` | `url` | Sends an HTTP request to any URL. `method` defaults to `POST`. Optional `headers` for auth. Use this for Opsgenie, custom integrations, or anything not covered by the dedicated channel types. |

## Reference channels in alerts

To attach a channel to an alert, reference it with the `@channel-name` syntax:

```yaml
monitors:
  - name: API Health
    type: http
    url: https://api.example.com/health
    alerts:
      - conditions:
          - type: consecutive_failures
            count: 3
        channels:
          - "@ops-slack"
          - "@on-call-email"
```

## Alert conditions

Each alert must have at least one condition. Multiple conditions on the same alert are combined with AND logic — all conditions must be met for the alert to trigger.

### consecutive_failures

Triggers after a monitor fails a specified number of times in a row.

```yaml
- type: consecutive_failures
  count: 3          # default: 2, min: 1
```

### response_time_threshold

Triggers when response time exceeds a threshold.

```yaml
- type: response_time_threshold
  maxMs: 5000       # milliseconds
```

### multi_location_failure

Triggers when a monitor fails from multiple locations within a time window. This reduces false positives from localized network issues.

```yaml
- type: multi_location_failure
  minLocations: 2   # default: 2, min: 2
  windowSeconds: 300 # default: 300 (5 minutes)
```

### ssl_expiry

Triggers when an SSL certificate is approaching expiration.

```yaml
- type: ssl_expiry
  daysBeforeExpiry: 14  # default: 14, min: 1
  severity: warning     # optional: critical | warning | info
```

### ssl_certificate_changed

Triggers when the leaf certificate's fingerprint changes between runs — useful for catching unexpected cert rotations and possible man-in-the-middle conditions.

```yaml
- type: ssl_certificate_changed
  severity: critical
```

### ssl_self_signed

Triggers when the endpoint presents a self-signed (or otherwise untrusted-root) certificate.

```yaml
- type: ssl_self_signed
  severity: critical
```

### ssl_protocol_deprecated

Triggers when the TLS handshake negotiates a protocol older than `minProtocol`.

```yaml
- type: ssl_protocol_deprecated
  minProtocol: TLSv1.2   # default: TLSv1.2 (allowed: TLSv1.2, TLSv1.3)
  severity: warning
```

### burn_rate

Triggers when an SLO's error budget is burning faster than a threshold across a short window AND a long window (the Google SRE multi-window burn-rate alerting pattern). Requires an existing SLO — reference it by ID.

```yaml
- type: burn_rate
  sloId: slo_abc123
  burnRateThreshold: 14.4   # burn rate multiple (e.g. 14.4 = budget exhausted in ~2 days at a 30d SLO)
  longWindowMinutes: 60     # minimum 60
  shortWindowMinutes: 5     # minimum 5, MUST be less than longWindowMinutes
```

Burn-rate alerts are automatically wired up when you set `burnRateAlerts: true` on an SLO (the default). Use a manual `burn_rate` condition only if you need custom threshold/window combinations beyond the built-in ones. See [Define SLOs](/docs/guides/define-slos) for the simpler path.

### baseline_anomaly

Triggers when a performance metric drifts away from its learned baseline for several consecutive runs. Baselines are stored per `(check, location, hour-of-day, day-of-week)` bucket so a monitor that's slower on Monday mornings doesn't trip the alert every Monday.

```yaml
- type: baseline_anomaly
  metric: response_time       # required
  sigmaThreshold: 3           # default: 3 (min: 2, max: 10)
  consecutiveCount: 3         # default: 3 (min: 2, max: 20, integer)
  direction: above            # default: above (allowed: above | below | both)
  severity: warning           # default: warning
```

**Supported metrics.** HTTP: `response_time`, `dns_lookup`, `tls_handshake`, `ttfb`, `content_transfer`. Browser: `lcp`, `fcp`, `cls`.

**How the chain works.** On each result ingestion the engine reads the last N runs for this check+location, regardless of status. The alert fires only if all N are successful AND each deviates by more than `sigmaThreshold`·σ from its own time-bucketed baseline in the configured direction. Any non-success run inside the window breaks the chain, so this alert stays scoped to drift-style regressions rather than outages. Failures are not skipped over to reach earlier successes: the window simply slides forward until it again contains N successes.

**Pick a reasonable threshold.** 3σ / 3 consecutive is a conservative starting point: under the normal assumption (and assuming run-to-run independence), the per-run false-positive rate at 3σ is ≈1-in-740 for one-sided checks (`direction: above` or `below`, the default) and ≈1-in-370 for two-sided (`direction: both`). Across 3 consecutive runs that compounds to ≈1-in-400-million one-sided or ≈1-in-50-million two-sided. In practice runs sharing a time bucket carry correlated noise (network conditions, regional perturbations), so treat the compounded figure as a theoretical ceiling. Tightening to 4σ / 5 consecutive buys near-zero false positives; loosening to 2σ / 2 consecutive is effectively a point-anomaly detector.

**Direction.** `above` catches slowdowns (the common case for response-time metrics). `below` catches suspiciously-fast responses, which often indicate the runner short-circuiting past the real work (stale cache hits, 304 storms, redirect chains being skipped). `both` is useful for CLS-style vitals where either side is a UX regression.

### Severity

All SSL-related conditions (including `ssl_expiry`), `mcp_schema_drift`, and `baseline_anomaly` accept an optional `severity` field with value `critical`, `warning`, or `info`. Severity is stored on the resulting alert instance and surfaces in the alerts dashboard: use it to distinguish "nice to know" rotations from genuine outages. `mcp_schema_drift` and `baseline_anomaly` default to `warning` (set by the shared schema); SSL conditions have no schema default and fall back to `critical` via the evaluator.

## Cascading alerts

Alerts follow the same cascade as other monitor settings: **defaults -> group -> monitor**. Define alerts at any level:

```yaml
defaults:
  alerts:
    - conditions:
        - type: consecutive_failures
          count: 2
      channels:
        - "@ops-slack"

groups:
  - name: Critical APIs
    alerts:
      - conditions:
          - type: consecutive_failures
            count: 1
        channels:
          - "@ops-slack"
          - "@pagerduty-oncall"
    monitors:
      - name: Payments API
        type: http
        url: https://api.example.com/payments
```

When a monitor defines its own `alerts`, those **replace** the inherited alerts entirely. To clear inherited alerts, set `alerts: []` on the monitor.

## Multi-tier alerting

To escalate alerts based on severity, define multiple alert rules with different conditions and channels:

```yaml
monitors:
  - name: Checkout Flow
    type: browser
    script: ./monitors/checkout.ts
    alerts:
      # Tier 1: Slack for initial failures
      - name: checkout-warning
        conditions:
          - type: consecutive_failures
            count: 2
        channels:
          - "@ops-slack"

      # Tier 2: PagerDuty for persistent multi-location failures
      - name: checkout-critical
        conditions:
          - type: consecutive_failures
            count: 5
          - type: multi_location_failure
            minLocations: 3
        channels:
          - "@pagerduty-oncall"
          - "@on-call-email"

      # SSL expiry: early warning
      - name: checkout-ssl
        conditions:
          - type: ssl_expiry
            daysBeforeExpiry: 30
            severity: warning
        channels:
          - "@ops-slack"

      # SSL rotation detection
      - name: checkout-ssl-rotation
        conditions:
          - type: ssl_certificate_changed
            severity: info
        channels:
          - "@ops-slack"
```

## OTel trace linking

When an alert fires, Yorker includes the OpenTelemetry trace ID in the notification payload. If your application propagates the W3C `traceparent` header, you can jump directly from an alert to the distributed trace in your observability backend (e.g., HyperDX, Jaeger, Grafana Tempo) to identify root cause.

## Web UI

To create alerts through the dashboard:

1. Navigate to a monitor's detail page.
2. Click **Add Alert Rule**.
3. Select one or more conditions and configure thresholds.
4. Choose notification channels (create them in **Settings > Notification Channels** if needed).
5. Click **Save**.

Alert rules created in the Web UI and the CLI are the same underlying resource. The CLI's `yorker deploy` command will detect and diff against rules created through the UI, and abort on drift unless you pass `--force` or `--accept-remote`.

You can also view all alerts across monitors from the **Alerts** page in the dashboard.

## CLI alert management

In addition to defining alerts in `yorker.config.yaml`, you can manage alert instances directly from the command line.

### List active alerts

```bash
yorker alerts list
```

Include resolved and recovered alerts with `--all`, or filter by monitor:

```bash
yorker alerts list --monitor "Homepage" --all
```

### Acknowledge and resolve

```bash
yorker alerts ack ainst_abc123
yorker alerts resolve ainst_abc123
```

### View alert history

```bash
yorker alerts history --since 7d
```

### Create alert rules imperatively

```bash
yorker alerts rules create \
  --monitor "Homepage" \
  --condition "consecutive_failures >= 3" \
  --channel nch_abc123 \
  --name "homepage-down"
```

Baseline-deviation rules use `baseline_anomaly:<metric>` (defaults to 3σ, 3 consecutive, above) or the explicit `baseline_anomaly:<metric>@<sigma>σ:<consecutive>[:above|below|both]` form:

```bash
yorker alerts rules create \
  --monitor "Checkout API" \
  --condition "baseline_anomaly:response_time" \
  --channel nch_abc123 \
  --severity warning

yorker alerts rules create \
  --monitor "Marketing site" \
  --condition "baseline_anomaly:lcp@4σ:5:above" \
  --channel nch_pagerduty \
  --severity critical
```

See the [CLI reference](/docs/reference/cli) for the full list of alert commands and condition formats.
