# Yorker Documentation (Full)

Complete documentation for Yorker synthetic monitoring platform.
For a curated summary, see https://yorkermonitoring.com/llms.txt

# Getting Started

## Quick Start

> Create your first monitor and start monitoring in under 5 minutes.


# Quick Start

This guide walks you through creating your first synthetic monitor with Yorker. You can use the Web UI for a visual approach or the CLI for infrastructure-as-code workflows.

## Sign up

To get started, create an account at [yorkermonitoring.com](https://yorkermonitoring.com). After signing in, you land on the dashboard where you can create monitors, view results, and configure alerts.

## Pick a monitor type

Yorker supports three monitor types:

- **HTTP** — sends an HTTP request and asserts on the response (status, body, headers, timing, SSL, OpenAPI conformance).
- **Browser** — runs a real Chromium session, either against a URL with automatic Core Web Vitals capture, or scripted with Playwright.
- **MCP** — exercises a Model Context Protocol server over Streamable HTTP, validates the advertised tools, and (optionally) calls them with expected output.

You can create monitors through the Web UI, the CLI, or via natural language.

## Create your first monitor (Web UI)

1. Open the dashboard and click **Create Monitor**.
2. Select **HTTP**, **Browser**, or **MCP**.
3. Enter the target URL (or MCP endpoint).
4. Choose one or more check locations.
5. Set the check frequency (default: every 5 minutes).
6. Click **Create**.

The monitor starts running immediately. Results appear on the dashboard within one check interval.

For a richer walkthrough — assertions, auth, browser scripts, MCP tool calls — see [Create a Monitor](/docs/guides/create-monitor).

## Create your first monitor (CLI)

To manage monitors as code, use the Yorker CLI.

### Install the CLI

```bash
npm install -g @yorker/cli
```

### Set your API key

Generate an API key from **Settings > API Keys** in the dashboard, then export it:

```bash
export YORKER_API_KEY=sk_...
```

### Scaffold a config file

```bash
yorker init
```

The interactive wizard creates a `yorker.config.yaml` with your project name, first monitor URL, type, and frequency. You can also pass flags to skip the prompts:

```bash
yorker init --name my-app --url https://example.com --type http --frequency 5m
```

### Deploy

```bash
yorker deploy
```

The CLI validates your config, computes a diff against the current remote state, and applies changes. You will see a Terraform-style plan showing what will be created, updated, or deleted before any changes are applied.

### View results

Check the status of your monitors:

```bash
yorker status
```

Or stream results in real time:

```bash
yorker results tail "Homepage"
```

Open the dashboard to see check results, response times, and status history. Each check result includes timing breakdowns (DNS, TLS, TTFB) and, for browser monitors, a filmstrip screenshot view.

## Next steps

- [Create a Monitor](/docs/guides/create-monitor) — HTTP, browser, and MCP monitors, assertions, auth, labels.
- [Set Up Alerts](/docs/guides/set-up-alerts) — consecutive failures, multi-location correlation, SSL alerts, burn-rate alerts.
- [Define SLOs](/docs/guides/define-slos) — availability and performance objectives with error budgets and burn rate alerts.
- [Deploy with CLI](/docs/guides/monitoring-as-code) — manage your full monitoring config as code with `yorker deploy`.
- [Install Dashboards](/docs/guides/install-dashboards) — one-click provisioning of pre-built HyperDX / ClickStack dashboards.


---


# Guides

## Create a Monitor

> How to create HTTP, browser, and MCP monitors using the Web UI, CLI, or natural language.


# Create a Monitor

Yorker supports three monitor types:

- **HTTP** — sends an HTTP request and asserts on the response (status code, body, headers, timing, SSL expiry, OpenAPI conformance).
- **Browser** — runs a real Chromium session, either against a URL (automatic Core Web Vitals + screenshots) or scripted with Playwright TypeScript.
- **MCP** — exercises a Model Context Protocol server over Streamable HTTP, validates the advertised tool list, and (optionally) calls tools with expected output.

There are three ways to create monitors: the Web UI, the CLI (monitoring as code), and natural-language generation.

## Web UI

To create a monitor through the dashboard:

1. Navigate to the dashboard and click **Create Monitor**.
2. Select the monitor type (**HTTP**, **Browser**, or **MCP**).
3. For HTTP monitors, fill in the URL, method, headers, authentication, and assertions.
4. For browser monitors, choose **URL mode** (just a URL) or **Scripted mode** and write a Playwright TypeScript script in the built-in editor.
5. For MCP monitors, enter the endpoint and optionally list expected tools and test calls.
6. Choose check locations and frequency.
7. Click **Create**.

## CLI / Monitoring as Code

To manage monitors as code, define them in `yorker.config.yaml` and deploy with `yorker deploy`.

### HTTP monitor

```yaml
project: my-app
monitors:
  - name: API Health
    type: http
    url: https://api.example.com/health
    method: GET
    frequency: 1m
    locations:
      - loc_us_east
      - loc_eu_central
    timeoutMs: 10000
    followRedirects: true
    headers:
      Accept: application/json
    labels:
      - env:production
      - service:api
    assertions:
      - type: status_code
        operator: equals
        value: 200
      - type: response_time
        max: 2000
      - type: body_contains
        value: '"status":"ok"'
      - type: body_json_path
        path: $.version
        operator: exists
      - type: header_value
        header: content-type
        operator: contains
        value: application/json
      - type: ssl_expiry
        daysBeforeExpiry: 14
```

### HTTP authentication

Three auth types are supported. Add an `auth` block to any HTTP monitor:

```yaml
# Basic auth
auth:
  type: basic
  username: "{{secrets.API_USER}}"
  password: "{{secrets.API_PASS}}"

# Bearer token
auth:
  type: bearer
  token: "{{secrets.AUTH_TOKEN}}"

# API key header (defaults to X-API-Key)
auth:
  type: api-key
  header: X-API-Key
  value: "{{secrets.API_KEY}}"
```

### Assertion types

| Type | Fields | Description |
|------|--------|-------------|
| `status_code` | `operator` (equals, not_equals, less_than, greater_than), `value` | Assert on HTTP status code. Operator defaults to `equals`. |
| `response_time` | `max` | Fail if response takes longer than `max` milliseconds. |
| `body_contains` | `value` | Fail if response body does not contain the string. |
| `body_matches` | `pattern` | Fail if response body does not match the regex pattern. |
| `body_json_path` | `path`, `operator` (equals, not_equals, contains, exists), `value` | Assert on a JSONPath expression. Operator defaults to `equals`. |
| `header_value` | `header`, `operator` (equals, contains, exists), `value` | Assert on a response header. Operator defaults to `equals`. |
| `ssl_expiry` | `daysBeforeExpiry` | Fail if SSL certificate expires within the given number of days. Defaults to 14. |
| `openapi_conformance` | `specId`, `operationPath` (optional), `validateHeaders` (optional) | Validate the response against an OpenAPI spec registered in Yorker. See [Assertions](/docs/reference/assertions#openapi_conformance). |

See the [Assertions reference](/docs/reference/assertions) for full details.

### Browser monitor — URL mode

The simplest browser monitor: Yorker navigates to the URL, captures Core Web Vitals, takes screenshots, and runs assertions. No script to write.

URL-mode browser monitors are currently created via the Web UI or the REST API. `yorker deploy` is scripted-only for browser monitors — create URL-mode monitors from the dashboard, or via `POST /api/checks` with `browserConfig.browserMode: "url"`. See the [REST API reference](/docs/reference/api#create-check).

### Browser monitor — scripted mode

Scripted browser monitors run full Playwright TypeScript scripts. Reference the script file from your config:

```yaml
monitors:
  - name: Checkout Flow
    type: browser
    script: ./monitors/checkout.ts
    frequency: 5m
    locations:
      - loc_us_east
    viewport:
      width: 1280
      height: 720
    screenshotMode: every_step
    videoEnabled: false
```

The script file (`./monitors/checkout.ts`) is a **Playwright script body**, not a full test file. The Yorker runner wraps your script in an async function and injects `page` and `context` (both standard Playwright objects) for you to use. That means:

- **Do not** write `import { test } from "@playwright/test"` or any other `import`/`require` statements — the script has no module scope.
- **Do not** wrap the code in `test(...)` or `describe(...)` — Yorker doesn't run the Playwright test runner.
- **Do** write the body of your check directly, referencing `page` (a `Page`) and `context` (a `BrowserContext`) as if they were already in scope.
- **Do** use `// @step: Name` comments to mark steps. Yorker captures a screenshot at each step (when `screenshotMode: every_step`) and surfaces step timing in the filmstrip view.

```typescript
// @step: Go to shop
await page.goto("https://shop.example.com");

// @step: Add to cart
await page.click("text=Add to Cart");

// @step: Checkout
await page.click("text=Checkout");
await page.waitForSelector(".order-confirmation");
```

#### Browser configuration options (scripted mode, YAML)

| Field | Default | Description |
|-------|---------|-------------|
| `script` | *(required)* | Path to the Playwright TypeScript script file, relative to the config file. |
| `viewport` | `{ width: 1280, height: 720 }` | Browser viewport dimensions. |
| `device` | *(none)* | Playwright device name for emulation (e.g., `"iPhone 14"`). |
| `screenshotMode` | `every_step` | When to capture screenshots: `every_step`, `failure_only`, or `disabled`. |
| `videoEnabled` | `false` | Whether to record video of the browser session. |
| `timeoutMs` | `30000` | Maximum script execution time (5000-120000 ms). |

### MCP monitor

MCP monitors check the health of a Model Context Protocol server over Streamable HTTP. On each run, Yorker:

1. Connects to the `endpoint` and calls `tools/list`.
2. Verifies every tool in `expectedTools` is present (if configured).
3. Runs each `testCalls` entry: invokes the tool and checks the result contains `expectedOutputContains` (if provided).
4. Optionally detects schema drift — tools that appeared, disappeared, or changed signatures since the last successful run.

```yaml
monitors:
  - name: Docs MCP Server
    type: mcp
    endpoint: https://mcp.example.com/sse
    frequency: 5m
    timeoutMs: 30000
    locations:
      - loc_us_east
    auth:
      type: bearer
      token: "{{secrets.MCP_TOKEN}}"
    expectedTools:
      - search_docs
      - fetch_page
    testCalls:
      - toolName: search_docs
        arguments:
          query: "pricing"
        expectedOutputContains: "Plans"
    detectSchemaDrift: true
```

#### MCP configuration options

| Field | Default | Description |
|-------|---------|-------------|
| `endpoint` | *(required)* | Streamable HTTP endpoint URL of the MCP server. |
| `timeoutMs` | `30000` | Request timeout (5000-120000 ms). |
| `auth` | *(none)* | Same auth block shape as HTTP monitors (`basic`, `bearer`, `api-key`). |
| `expectedTools` | *(none)* | Array of tool names that MUST be present. Missing tools fail the check. |
| `testCalls` | *(none)* | Array of tool invocations to run. Each entry has `toolName`, optional `arguments` (plain object), and optional `expectedOutputContains`. |
| `detectSchemaDrift` | `true` | Emit schema-drift events when the tool list or tool signatures change. |

> Note: MCP monitors cannot be executed locally with `yorker test` — deploy them and watch results via `yorker results tail` or the dashboard.

### Labels

Attach labels to any monitor. Labels serve two purposes: filtering and grouping in the dashboard, and emission as OTel resource attributes so you can slice telemetry by label in your observability backend.

```yaml
monitors:
  - name: Payments API
    type: http
    url: https://api.example.com/payments
    labels:
      - env:production      # key:value label
      - service:payments
      - critical            # boolean label (becomes yorker.label.critical="true")
```

Labels follow the convention `[a-zA-Z0-9][a-zA-Z0-9_.:-]*`, max 128 characters. Plain labels (no colon) emit as `yorker.label.<name>="true"`. Key-value labels emit as `yorker.label.<key>="<value>"`.

Omitting the `labels` field leaves labels unmanaged by config — the CLI preserves whatever labels exist on the remote. Setting `labels: []` explicitly clears all labels.

### Defaults and groups

To avoid repeating configuration across monitors, use `defaults` and `groups`.

**Defaults** apply to all monitors unless overridden:

```yaml
defaults:
  frequency: 5m
  locations:
    - loc_us_east
    - loc_eu_central
  http:
    timeoutMs: 15000
    followRedirects: true
    assertions:
      - type: status_code
        value: 200
  browser:
    viewport:
      width: 1280
      height: 720
    screenshotMode: every_step
```

**Groups** apply shared settings to a subset of monitors:

```yaml
groups:
  - name: API Endpoints
    frequency: 1m
    locations:
      - loc_us_east
      - loc_us_west
      - loc_eu_central
    monitors:
      - name: Users API
        type: http
        url: https://api.example.com/users
      - name: Orders API
        type: http
        url: https://api.example.com/orders
```

The cascade order is: **defaults -> group -> monitor**. Each level overrides the previous.

Per-monitor assertions **replace** defaults entirely (they do not merge). To clear inherited assertions, set `assertions: []` on the monitor.

### Frequency format

Frequencies use a shorthand: `30s` (seconds), `5m` (minutes), `1h` (hours). Valid range: 10 seconds to 24 hours.

## CLI imperative commands

For quick one-off monitors, use the `monitors` commands instead of a config file:

### Create a monitor

```bash
yorker monitors create --name "API Health" --type http --url https://api.example.com/health --frequency 1m
```

### List monitors

```bash
yorker monitors list
yorker monitors list --type http --status enabled
```

### View monitor details

```bash
yorker monitors get "API Health"
```

### Edit, pause, and resume

```bash
yorker monitors edit "API Health" --frequency 30s --add-location loc_eu_central
yorker monitors pause "API Health"
yorker monitors resume "API Health"
```

### Delete a monitor

```bash
yorker monitors delete "Old Endpoint" --yes
```

See the [CLI reference](/docs/reference/cli) for the full list of monitor commands and flags.

## Natural language

To create a monitor using natural language, use either the Web UI or the API.

**Web UI:** Click **Describe in plain English** on the create monitor page and type a description like "Monitor our checkout flow every 2 minutes from US and EU, alert if it takes longer than 3 seconds." Yorker generates a Playwright script you can edit before saving.

**API:** Send a POST request to `/api/checks/generate` with a description. The endpoint returns a generated Playwright script you can review, optionally refine, and save as a browser monitor:

```bash
curl -X POST https://app.yorkermonitoring.com/api/checks/generate \
  -H "Authorization: Bearer $YORKER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "description": "Navigate to https://shop.example.com, add the first product to cart, and verify the cart shows it.",
    "targetUrl": "https://shop.example.com"
  }'
```

You can also pass `previousScript` and `refinement` to iteratively improve an existing script. See the [REST API reference](/docs/reference/api#generate-playwright-script).

### Generate HTTP checks from an OpenAPI spec

`/api/checks/generate` also accepts a `spec` field. Pass an existing spec ID, an OpenAPI URL Yorker should fetch, or the name of a spec already on your team — Yorker creates one HTTP check per operation, skips operations that already have a check, and returns the full list. This is the API equivalent of the spec import flow:

```bash
curl -X POST https://app.yorkermonitoring.com/api/checks/generate \
  -H "Authorization: Bearer $YORKER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "spec": { "source": "url", "specUrl": "https://api.example.com/openapi.json" },
    "locations": ["us-east", "eu-west"],
    "frequencySeconds": 300
  }'
```

The `spec` field is a discriminated union: pass `{"source": "id", "specId": "spec_..."}` to use an existing spec, `{"source": "url", "specUrl": "https://..."}` to fetch and (if needed) import a new one, or `{"source": "name", "specName": "..."}` to look up by name. If the spec yields more than 50 *new* operations the first call returns `409` with `requiresConfirmation: true` — re-submit with `"confirm": true` to proceed. See the [Generate Checks From OpenAPI Spec](/docs/reference/api#generate-checks-from-openapi-spec) reference for the full request/response shape and error codes.


---

## Set Up Alerts

> How to configure alert rules — consecutive failures, multi-location correlation, SSL alerts, and SLO burn-rate alerts.


# Set Up Alerts

Alerts notify you when monitors detect problems. Each alert rule combines one or more **conditions** (what triggers the alert) with one or more **channels** (where the notification goes).

## Define alert channels

To send notifications, first define your channels in the `alertChannels` block at the top of `yorker.config.yaml`. Each channel has a name (the key) and a type-specific configuration.

```yaml
alertChannels:
  ops-slack:
    type: slack
    webhookUrl: "{{secrets.SLACK_WEBHOOK_URL}}"

  on-call-email:
    type: email
    addresses:
      - oncall@example.com
      - sre-team@example.com

  pagerduty:
    type: webhook
    url: "{{secrets.PAGERDUTY_WEBHOOK_URL}}"
    method: POST
    headers:
      Authorization: "Token token={{secrets.PD_TOKEN}}"
```

### Channel types

| Type | Required fields | Description |
|------|----------------|-------------|
| `slack` | `webhookUrl` | Posts to a Slack incoming webhook. |
| `email` | `addresses` (array, at least one) | Sends email to the listed addresses. |
| `webhook` | `url` | Sends an HTTP request. `method` defaults to `POST`. Optional `headers` for auth. |

## Reference channels in alerts

To attach a channel to an alert, reference it with the `@channel-name` syntax:

```yaml
monitors:
  - name: API Health
    type: http
    url: https://api.example.com/health
    alerts:
      - conditions:
          - type: consecutive_failures
            count: 3
        channels:
          - "@ops-slack"
          - "@on-call-email"
```

## Alert conditions

Each alert must have at least one condition. Multiple conditions on the same alert are combined with AND logic — all conditions must be met for the alert to trigger.

### consecutive_failures

Triggers after a monitor fails a specified number of times in a row.

```yaml
- type: consecutive_failures
  count: 3          # default: 2, min: 1
```

### response_time_threshold

Triggers when response time exceeds a threshold.

```yaml
- type: response_time_threshold
  maxMs: 5000       # milliseconds
```

### multi_location_failure

Triggers when a monitor fails from multiple locations within a time window. This reduces false positives from localized network issues.

```yaml
- type: multi_location_failure
  minLocations: 2   # default: 2, min: 2
  windowSeconds: 300 # default: 300 (5 minutes)
```

### ssl_expiry

Triggers when an SSL certificate is approaching expiration.

```yaml
- type: ssl_expiry
  daysBeforeExpiry: 14  # default: 14, min: 1
  severity: warning     # optional: critical | warning | info
```

### ssl_certificate_changed

Triggers when the leaf certificate's fingerprint changes between runs — useful for catching unexpected cert rotations and possible man-in-the-middle conditions.

```yaml
- type: ssl_certificate_changed
  severity: critical
```

### ssl_self_signed

Triggers when the endpoint presents a self-signed (or otherwise untrusted-root) certificate.

```yaml
- type: ssl_self_signed
  severity: critical
```

### ssl_protocol_deprecated

Triggers when the TLS handshake negotiates a protocol older than `minProtocol`.

```yaml
- type: ssl_protocol_deprecated
  minProtocol: TLSv1.2   # default: TLSv1.2 (allowed: TLSv1.2, TLSv1.3)
  severity: warning
```

### burn_rate

Triggers when an SLO's error budget is burning faster than a threshold across a short window AND a long window (the Google SRE multi-window burn-rate alerting pattern). Requires an existing SLO — reference it by ID.

```yaml
- type: burn_rate
  sloId: slo_abc123
  burnRateThreshold: 14.4   # burn rate multiple (e.g. 14.4 = budget exhausted in ~2 days at a 30d SLO)
  longWindowMinutes: 60     # minimum 60
  shortWindowMinutes: 5     # minimum 5, MUST be less than longWindowMinutes
```

Burn-rate alerts are automatically wired up when you set `burnRateAlerts: true` on an SLO (the default). Use a manual `burn_rate` condition only if you need custom threshold/window combinations beyond the built-in ones. See [Define SLOs](/docs/guides/define-slos) for the simpler path.

### baseline_anomaly

Triggers when a performance metric drifts away from its learned baseline for several consecutive runs. Baselines are stored per `(check, location, hour-of-day, day-of-week)` bucket so a monitor that's slower on Monday mornings doesn't trip the alert every Monday.

```yaml
- type: baseline_anomaly
  metric: response_time       # required
  sigmaThreshold: 3           # default: 3 (min: 2, max: 10)
  consecutiveCount: 3         # default: 3 (min: 2, max: 20, integer)
  direction: above            # default: above (allowed: above | below | both)
  severity: warning           # default: warning
```

**Supported metrics.** HTTP: `response_time`, `dns_lookup`, `tls_handshake`, `ttfb`, `content_transfer`. Browser: `lcp`, `fcp`, `cls`.

**How the chain works.** On each result ingestion the engine reads the last N runs for this check+location, regardless of status. The alert fires only if all N are successful AND each deviates by more than `sigmaThreshold`·σ from its own time-bucketed baseline in the configured direction. Any non-success run inside the window breaks the chain, so this alert stays scoped to drift-style regressions rather than outages. Failures are not skipped over to reach earlier successes: the window simply slides forward until it again contains N successes.

**Pick a reasonable threshold.** 3σ / 3 consecutive is a conservative starting point: under the normal assumption (and assuming run-to-run independence), the per-run false-positive rate at 3σ is ≈1-in-740 for one-sided checks (`direction: above` or `below`, the default) and ≈1-in-370 for two-sided (`direction: both`). Across 3 consecutive runs that compounds to ≈1-in-400-million one-sided or ≈1-in-50-million two-sided. In practice runs sharing a time bucket carry correlated noise (network conditions, regional perturbations), so treat the compounded figure as a theoretical ceiling. Tightening to 4σ / 5 consecutive buys near-zero false positives; loosening to 2σ / 2 consecutive is effectively a point-anomaly detector.

**Direction.** `above` catches slowdowns (the common case for response-time metrics). `below` catches suspiciously-fast responses, which often indicate the runner short-circuiting past the real work (stale cache hits, 304 storms, redirect chains being skipped). `both` is useful for CLS-style vitals where either side is a UX regression.

### Severity

All SSL-related conditions (including `ssl_expiry`), `mcp_schema_drift`, and `baseline_anomaly` accept an optional `severity` field with value `critical`, `warning`, or `info`. Severity is stored on the resulting alert instance and surfaces in the alerts dashboard: use it to distinguish "nice to know" rotations from genuine outages. `mcp_schema_drift` and `baseline_anomaly` default to `warning` (set by the shared schema); SSL conditions have no schema default and fall back to `critical` via the evaluator.

## Cascading alerts

Alerts follow the same cascade as other monitor settings: **defaults -> group -> monitor**. Define alerts at any level:

```yaml
defaults:
  alerts:
    - conditions:
        - type: consecutive_failures
          count: 2
      channels:
        - "@ops-slack"

groups:
  - name: Critical APIs
    alerts:
      - conditions:
          - type: consecutive_failures
            count: 1
        channels:
          - "@ops-slack"
          - "@pagerduty"
    monitors:
      - name: Payments API
        type: http
        url: https://api.example.com/payments
```

When a monitor defines its own `alerts`, those **replace** the inherited alerts entirely. To clear inherited alerts, set `alerts: []` on the monitor.

## Multi-tier alerting

To escalate alerts based on severity, define multiple alert rules with different conditions and channels:

```yaml
monitors:
  - name: Checkout Flow
    type: browser
    script: ./monitors/checkout.ts
    alerts:
      # Tier 1: Slack for initial failures
      - name: checkout-warning
        conditions:
          - type: consecutive_failures
            count: 2
        channels:
          - "@ops-slack"

      # Tier 2: PagerDuty for persistent multi-location failures
      - name: checkout-critical
        conditions:
          - type: consecutive_failures
            count: 5
          - type: multi_location_failure
            minLocations: 3
        channels:
          - "@pagerduty"
          - "@on-call-email"

      # SSL expiry: early warning
      - name: checkout-ssl
        conditions:
          - type: ssl_expiry
            daysBeforeExpiry: 30
            severity: warning
        channels:
          - "@ops-slack"

      # SSL rotation detection
      - name: checkout-ssl-rotation
        conditions:
          - type: ssl_certificate_changed
            severity: info
        channels:
          - "@ops-slack"
```

## OTel trace linking

When an alert fires, Yorker includes the OpenTelemetry trace ID in the notification payload. If your application propagates the W3C `traceparent` header, you can jump directly from an alert to the distributed trace in your observability backend (e.g., HyperDX, Jaeger, Grafana Tempo) to identify root cause.

## Web UI

To create alerts through the dashboard:

1. Navigate to a monitor's detail page.
2. Click **Add Alert Rule**.
3. Select one or more conditions and configure thresholds.
4. Choose notification channels (create them in **Settings > Notification Channels** if needed).
5. Click **Save**.

Alert rules created in the Web UI and the CLI are the same underlying resource. The CLI's `yorker deploy` command will detect and diff against rules created through the UI, and abort on drift unless you pass `--force` or `--accept-remote`.

You can also view all alerts across monitors from the **Alerts** page in the dashboard.

## CLI alert management

In addition to defining alerts in `yorker.config.yaml`, you can manage alert instances directly from the command line.

### List active alerts

```bash
yorker alerts list
```

Include resolved and recovered alerts with `--all`, or filter by monitor:

```bash
yorker alerts list --monitor "Homepage" --all
```

### Acknowledge and resolve

```bash
yorker alerts ack ainst_abc123
yorker alerts resolve ainst_abc123
```

### View alert history

```bash
yorker alerts history --since 7d
```

### Create alert rules imperatively

```bash
yorker alerts rules create \
  --monitor "Homepage" \
  --condition "consecutive_failures >= 3" \
  --channel nch_abc123 \
  --name "homepage-down"
```

Baseline-deviation rules use `baseline_anomaly:<metric>` (defaults to 3σ, 3 consecutive, above) or the explicit `baseline_anomaly:<metric>@<sigma>σ:<consecutive>[:above|below|both]` form:

```bash
yorker alerts rules create \
  --monitor "Checkout API" \
  --condition "baseline_anomaly:response_time" \
  --channel nch_abc123 \
  --severity warning

yorker alerts rules create \
  --monitor "Marketing site" \
  --condition "baseline_anomaly:lcp@4σ:5:above" \
  --channel nch_pagerduty \
  --severity critical
```

See the [CLI reference](/docs/reference/cli) for the full list of alert commands and condition formats.


---

## Define SLOs

> How to set Service Level Objectives — availability and performance targets with error budgets and burn rate alerts.


# Define SLOs

Service Level Objectives (SLOs) let you define a target for a monitor and track how much error budget remains. When a monitor's success rate drops below the target, Yorker calculates how fast you are consuming error budget and can alert before the budget runs out.

Yorker supports two SLI types:

- **Availability SLIs** — the ratio of successful check runs to total runs. Use this for uptime objectives.
- **Performance SLIs** — the ratio of runs where response time is at or below a threshold. Use this for latency objectives (e.g. "p95 < 500 ms").

## Why SLOs matter

Raw uptime percentages hide how much room you have for incidents. An SLO of 99.9% over 30 days gives you an error budget of roughly 43 minutes of downtime. Burn rate alerts tell you when you are spending that budget faster than expected, so you can respond before it is exhausted — rather than reacting after a hard threshold is crossed.

## Define an availability SLO

Add an `slos` block to your `yorker.config.yaml`. Each SLO references a monitor by name:

```yaml
project: my-app

alertChannels:
  ops-slack:
    type: slack
    webhookUrl: "{{secrets.SLACK_WEBHOOK_URL}}"

monitors:
  - name: API Health
    type: http
    url: https://api.example.com/health
    frequency: 1m

slos:
  - name: API Availability
    monitor: API Health
    target: "99.9%"
    window: 30d
    burnRateAlerts: true
    channels:
      - "@ops-slack"
```

### SLO fields

| Field | Required | Description |
|-------|----------|-------------|
| `name` | Yes | A unique name for the SLO. |
| `monitor` | Yes | The name of the monitor this SLO tracks. Must match a monitor defined in the same config. |
| `target` | Yes | Target. Accepts `"99.9%"` (string with percent sign) or `99.9` (number treated as percentage). |
| `window` | Yes | Evaluation window. Must be `7d`, `14d`, or `30d`. |
| `burnRateAlerts` | No | Enable burn rate alerting. Defaults to `true`. |
| `channels` | No | Notification channels for burn rate alerts. Uses `@channel-name` references to `alertChannels`. |
| `enabled` | No | Whether the SLO is active. Defaults to `true`. |

### Target parsing

The `target` field accepts two formats:

- **String with percent:** `"99.9%"` — parsed as 99.9%, stored as 9990 basis points.
- **Number:** `99.9` — treated as a percentage, stored as 9990 basis points.

Both formats produce the same result. Basis points range from 1 to 9999 (0.01% to 99.99%).

### Window

The evaluation window determines how far back Yorker looks when calculating availability and error budget. Only three values are accepted: `7d`, `14d`, or `30d` (or the equivalent integers `7`, `14`, `30`).

## Performance SLOs

Performance SLOs track the fraction of runs whose response time meets a latency threshold. They are available through the REST API and dashboard today — not through `yorker.config.yaml`. Create one via the API:

```bash
curl -X POST https://app.yorkermonitoring.com/api/slos \
  -H "Authorization: Bearer $YORKER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "sloType": "check",
    "checkId": "chk_abc123",
    "name": "API Latency",
    "targetBasisPoints": 9500,
    "windowDays": 30,
    "sliType": "performance",
    "perfThresholdMs": 500,
    "burnRateAlertsEnabled": true
  }'
```

| Field | Description |
|-------|-------------|
| `sliType: "performance"` | Use a latency-based SLI. Default is `availability`. |
| `perfThresholdMs` | Required for performance SLIs. Runs at or below this duration are considered compliant. |

Performance SLOs share everything else with availability SLOs — burn rate alerts, error budgets, and evaluation windows.

## Third-party SLOs

Yorker can also track SLOs against metrics emitted by services you do not own — external APIs, vendors, dependencies. When the runner observes an outbound call to a third-party host, Yorker extracts metrics and can roll them up into an SLO.

Third-party SLOs are discovered and configured through the dashboard (**SLOs > New > Third-party**). See the [REST API reference](/docs/reference/api#slos) for the underlying `sloType: "third_party"` fields.

## Burn rate alerts

When `burnRateAlerts` is enabled (the default), Yorker continuously calculates how fast error budget is being consumed. The system tracks burn rates over three windows:

- **1-hour burn rate** — detects sudden spikes in failures.
- **6-hour burn rate** — catches sustained degradation.
- **24-hour burn rate** — identifies slow-burn issues.

If the burn rate indicates the error budget will be exhausted before the window ends, an alert fires through the configured channels. Advanced users can also configure custom `burn_rate` alert conditions directly — see [Set Up Alerts](/docs/guides/set-up-alerts#burn_rate).

## View SLO status

To view SLO status in the dashboard, navigate to the **SLOs** section. Each SLO shows:

- Current availability or performance ratio
- Error budget total, consumed, and remaining
- Burn rates at 1h, 6h, and 24h windows
- Whether the SLO is currently met

To check SLO status via the API:

```bash
curl https://app.yorkermonitoring.com/api/slos/slo_abc123/status \
  -H "Authorization: Bearer $YORKER_API_KEY"
```

## Complete example

```yaml
project: my-app

alertChannels:
  ops-slack:
    type: slack
    webhookUrl: "{{secrets.SLACK_WEBHOOK_URL}}"
  pagerduty:
    type: webhook
    url: "{{secrets.PAGERDUTY_WEBHOOK_URL}}"

monitors:
  - name: Homepage
    type: http
    url: https://example.com
    frequency: 1m
  - name: Checkout Flow
    type: browser
    script: ./monitors/checkout.ts
    frequency: 5m

slos:
  - name: Homepage Availability
    monitor: Homepage
    target: "99.95%"
    window: 30d
    burnRateAlerts: true
    channels:
      - "@ops-slack"

  - name: Checkout Availability
    monitor: Checkout Flow
    target: "99.9%"
    window: 30d
    burnRateAlerts: true
    channels:
      - "@ops-slack"
      - "@pagerduty"
```


---

## Notification Channels

> How to create and manage notification channels for Slack, email, and webhook alerts.


# Notification Channels

Notification channels define where alerts are sent. Yorker supports three channel types: **Slack**, **email**, and **webhook**. Channels are shared across your team and can be referenced by any alert rule or SLO.

## Channel Types

### Slack

Posts alert notifications to a Slack channel via an incoming webhook.

```yaml
alertChannels:
  ops-slack:
    type: slack
    webhookUrl: "{{secrets.SLACK_WEBHOOK_URL}}"
```

To set up a Slack webhook, create an [Incoming Webhook](https://api.slack.com/messaging/webhooks) in your Slack workspace and use the generated URL.

### Email

Sends alert notifications to one or more email addresses.

```yaml
alertChannels:
  on-call-email:
    type: email
    addresses:
      - oncall@example.com
      - sre-team@example.com
```

At least one address is required.

### Webhook

Sends an HTTP request to any URL. Use this for PagerDuty, Opsgenie, custom integrations, or any service that accepts webhooks.

```yaml
alertChannels:
  pagerduty:
    type: webhook
    url: "{{secrets.PAGERDUTY_WEBHOOK_URL}}"
    method: POST
    headers:
      Authorization: "Token token={{secrets.PD_TOKEN}}"
```

| Field | Default | Description |
|---|---|---|
| `url` | *(required)* | Webhook endpoint URL. |
| `method` | `POST` | HTTP method. |
| `headers` | *(none)* | Optional headers for authentication. |

## Create channels in the Web UI

1. Navigate to **Settings > Notification Channels**.
2. Click **Create Channel**.
3. Select the channel type (Slack, Email, or Webhook).
4. Fill in the required fields.
5. Click **Save**.

Channels created in the UI and via the CLI are the same underlying resource. The CLI's `yorker deploy` detects and diffs against channels created through the UI.

## Define channels in YAML

Add an `alertChannels` block at the top level of `yorker.config.yaml`:

```yaml
alertChannels:
  ops-slack:
    type: slack
    webhookUrl: "{{secrets.SLACK_WEBHOOK_URL}}"

  on-call-email:
    type: email
    addresses:
      - oncall@example.com

  pagerduty:
    type: webhook
    url: "{{secrets.PAGERDUTY_WEBHOOK_URL}}"
    method: POST
    headers:
      Authorization: "Token token={{secrets.PD_TOKEN}}"
```

### Reference channels in alerts

Use the `@channel-name` syntax to attach a channel to an alert rule:

```yaml
monitors:
  - name: API Health
    type: http
    url: https://api.example.com/health
    alerts:
      - conditions:
          - type: consecutive_failures
            count: 3
        channels:
          - "@ops-slack"
          - "@on-call-email"
```

### Reference channels in SLOs

SLOs can also reference channels for burn rate alerts:

```yaml
slos:
  - name: API Availability
    monitor: API Health
    target: "99.9%"
    window: 30d
    channels:
      - "@ops-slack"
```

## Deploy ordering

When you run `yorker deploy`, channels are created/updated **before** alerts and SLOs (phase 0), and deleted **after** all references are removed (phase Z). This ensures channels always exist when alerts or SLOs reference them.

## Customize incident notification templates

Slack, email, and webhook channels support per-event template overrides that let you replace Yorker's default payload with your own Handlebars-rendered markup. The in-app editor at **Settings > Notification Channels > Templates** gives you:

- A Handlebars editor with channel-appropriate syntax highlighting (JSON for Slack/webhook, HTML for email)
- A live preview rendered against six canonical incident fixtures (single HTTP failure, multi-location burst, browser check, MCP, and more)
- A library of starters and curated examples you can apply with one click, plus a diff view against the currently saved template
- **Send test**: dispatches the current saved template to the real channel with the selected fixture (60-second cooldown per channel)

See the [Incidents concepts page](/docs/concepts/incidents#user-editable-templates) for the render context and fallback semantics. Per-channel examples and helper reference live on each integration page: [Slack](/docs/integrations/slack#template-overrides), [Email](/docs/integrations/email#template-overrides), [Webhook](/docs/integrations/webhook#template-overrides).

PagerDuty and ServiceNow template overrides are authored through the REST API only; UI editor support for those channel types is planned for a future release.

## Manage channels via API

Channels can also be managed through the REST API. See the [REST API reference](/docs/reference/api) for endpoints.


---

## Deploy with CLI

> How to deploy monitors, alerts, and SLOs from yorker.config.yaml using the Yorker CLI.


# Deploy with CLI

The Yorker CLI lets you define monitors, alerts, notification channels, and SLOs in a `yorker.config.yaml` file and deploy them with a single command. Changes are computed as a diff against the current remote state and applied in the correct order.

## Install

```bash
npm install -g @yorker/cli
```

## Authenticate

Generate an API key from **Settings > API Keys** in the dashboard. Export it as an environment variable:

```bash
export YORKER_API_KEY=sk_...
```

The CLI also accepts `YORKER_API_URL` to point at a different control plane (defaults to `https://app.yorkermonitoring.com`).

## Scaffold a config

To create a starter `yorker.config.yaml`, run:

```bash
yorker init
```

The interactive wizard walks you through project name, first monitor URL, type, and frequency.

Pass flags to skip prompts:

```bash
yorker init --name my-app --url https://example.com --type http --frequency 5m
```

## Config file structure

The `yorker.config.yaml` file has these top-level sections:

```yaml
project: my-app              # Required. Project identifier.

alertChannels:                # Notification channel definitions.
  ops-slack:
    type: slack
    webhookUrl: "{{secrets.SLACK_WEBHOOK_URL}}"

defaults:                     # Default settings for all monitors.
  frequency: 5m
  locations:
    - loc_us_east
    - loc_eu_central

groups:                       # Groups of monitors with shared settings.
  - name: API Endpoints
    frequency: 1m
    monitors:
      - name: Users API
        type: http
        url: https://api.example.com/users

monitors:                     # Top-level monitor definitions.
  - name: Homepage
    type: http
    url: https://example.com

slos:                         # Service Level Objectives.
  - name: Homepage Availability
    monitor: Homepage
    target: "99.9%"
    window: 30d

maintenanceWindows:           # Scheduled silences / pauses.
  - name: Weekly DB maintenance
    checks: all
    mode: pause               # pause | continue
    startsAt: "2026-04-12T02:00:00Z"
    endsAt:   "2026-04-12T03:00:00Z"
    recurring: true
    recurrenceRule: "FREQ=WEEKLY;BYDAY=SU"
```

See the [Configuration reference](/docs/reference/configuration#maintenancewindows) for the full `maintenanceWindows` schema.

## Commands

### yorker validate

Validates the config file without deploying. Checks YAML syntax, Zod schema validation, script file existence, secret interpolation, and cross-references (e.g., SLOs referencing valid monitors, alerts referencing valid channels).

```bash
yorker validate
```

Exit code 0 means the config is valid. Non-zero means errors were found — they are printed to stderr.

### yorker diff

Shows what would change between your local config and the remote state without applying anything:

```bash
yorker diff
```

Output shows each resource as CREATE, UPDATE, DELETE, or UNCHANGED, with field-level diffs for updates:

```
Yorker deploy plan for "my-app"

  Checks:
    + CREATE  http  "API Health"  (60s, 2 locations)
    ~ UPDATE  http  "Homepage"
        ~ configJson.timeoutMs  30000 -> 15000
    = UNCHANGED  browser  "Checkout Flow"

  Summary: 1 to create, 1 to update, 1 unchanged

  (dry run — no changes applied)
```

### yorker deploy

Applies changes to the remote state:

```bash
yorker deploy
```

Resources are applied in dependency order: channels first, then checks (with label sync), then alerts, SLOs, and maintenance windows. See the [CLI reference](/docs/reference/cli#deploy-phases) for the full phase table.

Remote resources that exist but are not in your config are reported but **not deleted** unless you pass `--prune`.

### yorker deploy --prune

Deletes remote resources that are not present in the config file:

```bash
yorker deploy --prune
```

This is useful for keeping the remote state in exact sync with the config. Prune deletions happen in the correct dependency order as part of the normal deploy phases — SLOs and alerts are removed early (phases A and B, before their parent checks in phase C), maintenance windows at phase I, and channels at phase Z. See the [CLI reference for `yorker deploy`](/docs/reference/cli#deploy-phases) for the full phase table.

### yorker deploy --force / --accept-remote

If someone edits a YAML-managed resource via the web UI, the next `yorker deploy` detects the change and aborts with a drift report. Two flags control how to resolve:

```bash
yorker deploy --force           # local config wins — overwrite remote
yorker deploy --accept-remote   # skip drifted resources — keep remote as-is
```

See [Drift detection](#drift-detection) below for details.

### yorker status

Displays the current state of all monitors:

```bash
yorker status
```

### yorker results tail

Live-stream check results as they arrive:

```bash
yorker results tail "Homepage" --interval 30s
```

### yorker test

Runs HTTP monitors locally against their configured URLs. Useful for validating URLs and auth before deploying:

```bash
yorker test
```

Browser monitors are listed but not executed locally (use Playwright directly for local browser tests).

## Drift detection

The CLI tracks the state of each deployed resource in `.yorker/.deploy-state.json` (gitignored, per-machine). After every successful deploy, it saves a config hash and the remote `updatedAt` timestamp for each resource. On the next deploy, it compares:

- **Local changed?** — current config hash differs from the stored hash.
- **Remote changed?** — remote `updatedAt` is newer than the stored timestamp, and the resource has `managedBy: "yaml"`.

This produces four possible outcomes:

| | Remote unchanged | Remote changed |
|---|---|---|
| **Local unchanged** | Skip | **Drift** (remote-only edit) |
| **Local changed** | Normal update | **Conflict** (both sides changed) |

When drift or conflicts are detected, the deploy aborts with a report showing which resources were affected. You have three options:

1. **Review and choose** — inspect the remote changes in the dashboard, then update your config to match or intentionally overwrite.
2. **`--force`** — local config wins, remote changes are overwritten.
3. **`--accept-remote`** — drifted resources are skipped, keeping their remote state.

### First deploy

On the first deploy (no state file exists), drift detection is skipped entirely — the CLI creates the state file after a successful apply.

### After `yorker pull`

Every successful `yorker pull` rewrites `.yorker/.deploy-state.json` with a fresh snapshot of remote state, so the next deploy treats everything as a clean baseline with no drift.

### `.yorker/.deploy-state.json`

This file is per-machine state and should not be committed. Add it to your `.gitignore`:

```text
.yorker/.deploy-state.json
```

---

## Secret interpolation

To keep secrets out of your config file, use placeholder syntax. The CLI resolves these at deploy time from environment variables.

### `{{secrets.NAME}}`

Looks up `YORKER_SECRET_NAME` first, then falls back to `NAME`:

```yaml
auth:
  type: bearer
  token: "{{secrets.AUTH_TOKEN}}"
```

Set with `export YORKER_SECRET_AUTH_TOKEN=...` or `export AUTH_TOKEN=...`.

### `{{env.NAME}}`

Looks up the environment variable directly:

```yaml
monitors:
  - name: Staging API
    type: http
    url: "{{env.STAGING_URL}}/health"
```

### `${NAME}`

Legacy shorthand, equivalent to `{{env.NAME}}`. Not applied inside browser script files to avoid conflicts with JavaScript template literals.

The CLI fails with a clear error if any placeholder is unresolved after interpolation.

## CI/CD integration

Validate on push, preview changes on PRs with `yorker diff`, and deploy on merge. See the full [CI/CD Integration guide](/docs/guides/ci-cd) for complete GitHub Actions and GitLab CI workflows.

```yaml
# GitHub Actions — quick start
- run: npm install -g @yorker/cli
- run: yorker validate
- run: yorker diff
- run: yorker deploy --force                    # CI owns the config
  if: github.ref == 'refs/heads/main'
```

Use `--force` in CI pipelines where the config file is the source of truth. If you want CI to preserve manual edits made via the dashboard, use `--accept-remote` instead.

Set `YORKER_API_KEY` and any `YORKER_SECRET_*` variables at the job or workflow level so all steps — including `validate` — can resolve `{{secrets.*}}` placeholders.

## Complete example

```yaml
project: my-app

alertChannels:
  ops-slack:
    type: slack
    webhookUrl: "{{secrets.SLACK_WEBHOOK_URL}}"
  on-call-email:
    type: email
    addresses:
      - oncall@example.com
  pagerduty:
    type: webhook
    url: "{{secrets.PAGERDUTY_WEBHOOK_URL}}"
    method: POST
    headers:
      Authorization: "Token token={{secrets.PD_TOKEN}}"

defaults:
  frequency: 5m
  locations:
    - loc_us_east
    - loc_eu_central
  http:
    timeoutMs: 15000
    followRedirects: true
    assertions:
      - type: status_code
        value: 200
  browser:
    viewport:
      width: 1280
      height: 720
    screenshotMode: every_step
  alerts:
    - conditions:
        - type: consecutive_failures
          count: 2
      channels:
        - "@ops-slack"

groups:
  - name: Critical APIs
    frequency: 1m
    locations:
      - loc_us_east
      - loc_us_west
      - loc_eu_central
    alerts:
      - conditions:
          - type: consecutive_failures
            count: 1
          - type: multi_location_failure
            minLocations: 2
        channels:
          - "@ops-slack"
          - "@pagerduty"
    monitors:
      - name: Payments API
        type: http
        url: https://api.example.com/payments
        assertions:
          - type: status_code
            value: 200
          - type: response_time
            max: 1000
      - name: Auth API
        type: http
        url: https://api.example.com/auth/health

monitors:
  - name: Homepage
    type: http
    url: https://example.com
  - name: Checkout Flow
    type: browser
    script: ./monitors/checkout.ts
    frequency: 5m
    alerts:
      - name: checkout-warning
        conditions:
          - type: consecutive_failures
            count: 2
        channels:
          - "@ops-slack"
      - name: checkout-critical
        conditions:
          - type: consecutive_failures
            count: 5
        channels:
          - "@pagerduty"
          - "@on-call-email"

slos:
  - name: Homepage Availability
    monitor: Homepage
    target: "99.95%"
    window: 30d
    channels:
      - "@ops-slack"
  - name: Checkout Availability
    monitor: Checkout Flow
    target: "99.9%"
    window: 30d
    channels:
      - "@ops-slack"
      - "@pagerduty"
```


---

## CI/CD Integration

> How to validate, diff, and deploy monitors automatically with GitHub Actions or GitLab CI.


# CI/CD Integration

Wire `yorker validate`, `yorker diff`, and `yorker deploy` into your CI pipeline to get config validation on every push, change previews on pull requests, and automatic deploys on merge.

## Prerequisites

1. **API key** -- generate one from **Settings > API Keys** in the dashboard.
2. **Store as a secret** -- add it as `YORKER_API_KEY` in your CI provider's secret store.
3. **Config committed** -- your `yorker.config.yaml` and any `monitors/` script files must be in version control.

---

## GitHub Actions

Create `.github/workflows/yorker.yml` in your repository:

```yaml
name: Yorker Monitoring as Code

on:
  push:
    paths:
      - "yorker.config.yaml"
      - "monitors/**"
  pull_request:
    paths:
      - "yorker.config.yaml"
      - "monitors/**"

env:
  YORKER_API_KEY: ${{ secrets.YORKER_API_KEY }}
  # Add YORKER_SECRET_* vars here if your config uses {{secrets.*}} interpolation

jobs:
  validate:
    name: Validate config
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm install -g @yorker/cli
      - run: yorker validate

  diff:
    name: Preview changes
    if: github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository
    needs: validate
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
      issues: write
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm install -g @yorker/cli
      - name: Run diff
        id: diff
        run: |
          set +e
          OUTPUT=$(yorker diff --json 2>/dev/null)
          EXIT_CODE=$?
          set -e
          EOF_MARKER=$(dd if=/dev/urandom bs=15 count=1 2>/dev/null | base64)
          echo "json<<$EOF_MARKER" >> "$GITHUB_OUTPUT"
          echo "$OUTPUT" >> "$GITHUB_OUTPUT"
          echo "$EOF_MARKER" >> "$GITHUB_OUTPUT"
          echo "exit_code=$EXIT_CODE" >> "$GITHUB_OUTPUT"
      - name: Comment on PR
        if: always()
        uses: actions/github-script@v7
        env:
          DIFF_JSON: ${{ steps.diff.outputs.json }}
        with:
          script: |
            const esc = (s) => s.replace(/[|\\`*_{}[\]<>()#+\-!~@\n\r]/g, (ch) => ch === '\n' || ch === '\r' ? ' ' : ch === '@' ? '&#64;' : `\\${ch}`);
            const raw = process.env.DIFF_JSON ?? '';
            let body;
            try {
              const result = JSON.parse(raw);
              if (!result.ok) {
                body = `### Yorker Diff\n\n:x: Error: ${result.error?.message ?? 'Unknown error'}`;
              } else {
                const changes = result.data?.changes ?? [];
                const actionable = changes.filter(c => c.type !== 'unchanged');
                if (actionable.length === 0) {
                  body = '### Yorker Diff\n\n:white_check_mark: No changes. Remote state matches local config.';
                } else {
                  const symbols = { create: '+', update: '~', delete: '-' };
                  const rows = actionable
                    .map(c => `| ${symbols[c.type] ?? '?'} ${c.type} | ${esc(c.kind)} | ${esc(c.name)} |`)
                    .join('\n');
                  body = `### Yorker Diff\n\n| Action | Type | Name |\n|---|---|---|\n${rows}\n\n${actionable.length} change(s) will be applied on merge.`;
                }
              }
            } catch {
              body = `### Yorker Diff\n\n:warning: Could not parse diff output.\n\n<details><summary>Raw output</summary>\n\n\`\`\`\n${raw}\n\`\`\`\n</details>`;
            }

            const comments = await github.paginate(github.rest.issues.listComments, {
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
            });
            const existing = comments.find(c =>
              c.user?.type === 'Bot' && c.body?.startsWith('### Yorker Diff')
            );
            if (existing) {
              await github.rest.issues.updateComment({
                owner: context.repo.owner,
                repo: context.repo.repo,
                comment_id: existing.id,
                body,
              });
            } else {
              await github.rest.issues.createComment({
                owner: context.repo.owner,
                repo: context.repo.repo,
                issue_number: context.issue.number,
                body,
              });
            }
      - name: Fail on diff errors
        if: steps.diff.outputs.exit_code != '0'
        run: exit ${{ steps.diff.outputs.exit_code }}

  deploy:
    name: Deploy monitors
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    needs: validate
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm install -g @yorker/cli
      - run: yorker deploy
```

### How it works

| Trigger | Job | What it does |
|---|---|---|
| Push or PR touching config files | **validate** | Validates YAML syntax and schema. Blocks the pipeline on errors. |
| Same-repo pull request | **diff** | Runs `yorker diff --json`, parses the output, and posts a summary comment on the PR. Updates the same comment on subsequent pushes. Skipped for fork PRs. |
| Push to main touching config files | **deploy** | Applies changes to the remote state. Runs after validation passes. |

**Note:** The diff job's `if:` condition skips fork PRs, where `GITHUB_TOKEN` is read-only and repository secrets are not exposed. The validate job still runs on fork PRs, but will fail if your config uses `{{secrets.*}}` placeholders (since the corresponding environment variables won't be set). If you accept fork contributions, either avoid secret placeholders in validation-critical fields or add the same `full_name == github.repository` guard to the validate job.

### Secrets

The workflow uses workflow-level `env:` so all jobs (including `validate`) can resolve `{{secrets.*}}` and `{{env.*}}` placeholders. Add secrets referenced in your config as additional environment variables:

```yaml
env:
  YORKER_API_KEY: ${{ secrets.YORKER_API_KEY }}
  YORKER_SECRET_SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
  YORKER_SECRET_AUTH_TOKEN: ${{ secrets.AUTH_TOKEN }}
```

---

## GitLab CI

Create `.gitlab-ci.yml` in your repository:

```yaml
stages:
  - validate
  - diff
  - deploy

.yorker:
  image: node:20-slim
  before_script:
    - npm install -g @yorker/cli

validate:
  extends: .yorker
  stage: validate
  rules:
    - changes:
        - yorker.config.yaml
        - monitors/**
  script:
    - yorker validate

diff:
  extends: .yorker
  stage: diff
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
      changes:
        - yorker.config.yaml
        - monitors/**
  script:
    - yorker diff
  variables:
    YORKER_API_KEY: $YORKER_API_KEY

deploy:
  extends: .yorker
  stage: deploy
  rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
      changes:
        - yorker.config.yaml
        - monitors/**
  script:
    - yorker deploy
  variables:
    YORKER_API_KEY: $YORKER_API_KEY
```

Add `YORKER_API_KEY` to your project's **Settings > CI/CD > Variables** as a masked variable. Only mark it as protected if you restrict it to protected branches — otherwise MR pipelines on unprotected branches won't have access.

---

## JSON output format

Most CLI commands support `--json` for machine-readable output and share a consistent envelope. The interactive `yorker dashboard` command does not emit this envelope. The other exception is `yorker results tail --json`, which emits one JSON object per result (newline-delimited) instead of a single envelope — this allows streaming consumption.

### Success

```json
{
  "ok": true,
  "data": { ... }
}
```

### Error

```json
{
  "ok": false,
  "error": {
    "code": "general_error",
    "message": "3 config error(s): ..."
  }
}
```

### Exit codes

These are the codes most relevant to CI pipelines (`validate`, `diff`, `deploy`). Other commands may use additional codes (e.g., `yorker status` exits `10` when monitors are unhealthy).

| Code | Meaning |
|---|---|
| `0` | Success |
| `1` | General error (validation failure, API error, missing config) |
| `2` | Authentication failure |
| `3` | Plan/quota limit exceeded |
| `4` | Partial failure (some deploy operations succeeded, others failed) |

### Key command outputs

**`yorker validate --json`**

```json
{
  "ok": true,
  "data": {
    "valid": true,
    "monitors": 5,
    "slos": 2
  }
}
```

**`yorker diff --json`**

```json
{
  "ok": true,
  "data": {
    "changes": [
      {
        "type": "create",
        "kind": "check",
        "name": "API Health",
        "fieldChanges": []
      },
      {
        "type": "update",
        "kind": "check",
        "name": "Homepage",
        "fieldChanges": [
          { "path": "configJson.timeoutMs", "oldValue": 30000, "newValue": 15000 }
        ]
      },
      {
        "type": "unchanged",
        "kind": "check",
        "name": "Orders API",
        "fieldChanges": []
      }
    ]
  }
}
```

Each change has a `type` (`create`, `update`, `delete`, `unchanged`), a `kind` (`check`, `alert`, `slo`, `channel`), and a `fieldChanges` array (empty when there are no field-level differences). Actual CLI output also includes metadata fields such as `remoteId`, `local`, and `remote` which are omitted here for brevity.

**`yorker deploy --json`**

Same as `diff`, plus an `applied` field with operation counts:

```json
{
  "ok": true,
  "data": {
    "changes": [ ... ],
    "applied": {
      "created": 1,
      "updated": 1,
      "deleted": 0,
      "errors": []
    }
  }
}
```

If `applied.errors` is non-empty, the exit code is `4` (partial failure).

---

## Tips

### Pin the CLI version

The CLI install takes 2-3 seconds. To lock a specific version:

```bash
npm install -g @yorker/cli@0.4.0
```

### Deploy with pruning

To keep remote state exactly in sync (deleting monitors removed from config):

```bash
yorker deploy --prune
```

Only use this if your config is the single source of truth. Monitors created through the web UI will be deleted.

### Gate deploys on diff

To require an explicit approval step before deploying, separate the diff and deploy jobs and add a manual gate:

```yaml
# GitLab CI
deploy:
  stage: deploy
  when: manual
  script:
    - yorker deploy
```

### Multiple environments

Use environment variables to deploy different configs to different environments:

```yaml
# GitHub Actions
deploy-staging:
  env:
    YORKER_API_KEY: ${{ secrets.YORKER_API_KEY_STAGING }}
  steps:
    - run: yorker deploy

deploy-production:
  env:
    YORKER_API_KEY: ${{ secrets.YORKER_API_KEY_PRODUCTION }}
  needs: [deploy-staging]
  steps:
    - run: yorker deploy
```

Each API key is scoped to a team, so the same config deploys to different teams.


---

## Private Locations

> Deploy Yorker runners on your own infrastructure to monitor internal services, staging environments, and VPNs.


# Private Locations

A private location runs the Yorker check runner on your own infrastructure instead of Yorker's hosted fleet. Use it to monitor internal services, staging environments, or anything behind a VPN -- from inside your own network. The same monitors, alerting, and dashboards work identically; the only difference is where the agent executes.

Private location runs are billed at 50% of hosted runs because you provide the compute.

## Prerequisites

- A Yorker plan that includes private locations (free tier: 0; paid: up to 2; enterprise: unlimited)
- Docker installed on the target machine
- Outbound HTTPS access from the agent machine to `app.yorkermonitoring.com:443` -- no inbound connections are required

---

## Step 1: Create a Location

To create a location via the CLI, provide a unique slug (`--name`) and a human-readable label (`--display-name`):

```bash
yorker locations create --name hq-data-center --display-name "HQ Data Center"
```

The CLI also auto-generates a runner key. Use the key and location ID from the output when continuing with Step 3 below.

To create a location via the dashboard: go to **Settings > Locations** and click **Add Location**.

Both paths return a location ID in the form `loc_...`. Save it -- you will need it in the next step.

---

## Step 2: Generate a Runner Key

Runner keys authenticate the agent to the control plane. Each key is shown only once.

To generate a key via the CLI:

```bash
yorker locations keys create <location-id>
```

To generate a key via the dashboard: expand the location row, enter a key name, and click **Create Key**.

Copy the key immediately -- it cannot be retrieved after the page is closed.

---

## Step 3: Deploy the Agent

The repository includes a reference Docker Compose configuration at `docker/docker-compose.private-agent.yml` and an environment template at `docker/private-agent.env.example`. The template pre-fills `RUNNER_MODE`, `CONTROL_PLANE_URL`, and `LOCATION_TYPE` with their required values. You only need to supply the four values specific to your deployment:

```bash
# Copy the reference environment file
cp docker/private-agent.env.example docker/.env
```

Open `docker/.env` and fill in:

```bash
RUNNER_API_KEY=rk_...          # The runner key from Step 2
LOCATION_ID=loc_...            # The location ID from Step 1
TEAM_ID=team_...               # Included in the `yorker locations create` CLI output
LOCATION_NAME=HQ Data Center   # Human-readable label for logs and telemetry
```

Then start the agent:

```bash
docker compose -f docker/docker-compose.private-agent.yml up -d
```

### Choosing an image

Two runner Dockerfiles are available:

| Dockerfile | Includes | Use when |
|---|---|---|
| `Dockerfile.runner-heavy` (~1 GB) | Playwright + Chromium | You need browser checks |
| `Dockerfile.runner-light` (~200 MB) | HTTP/API checks only | You only run HTTP/ping monitors |

The default Compose file builds the heavy image. To switch to the lighter image, comment out the `yorker-agent` service in the Compose file and uncomment `yorker-agent-light`.

---

## Step 4: Verify Health

After the agent starts, it begins polling the control plane. To verify it is connected:

- **Dashboard:** the location's status badge turns green (Active) in **Settings > Locations**.
- **CLI:** run `yorker locations list` and check the status column.

Health states are derived from the time since the last successful poll:

| Status | Meaning |
|---|---|
| Active | Last poll within the past 5 minutes |
| Degraded | Last poll 5--10 minutes ago |
| Offline | Last poll more than 10 minutes ago, or never connected |

If the agent does not appear Active within a few minutes, see [Troubleshooting](#troubleshooting) below.

---

## Step 5: Assign the Location to a Monitor

Monitors only execute in the locations they are assigned to. A monitor can be assigned to any mix of hosted and private locations.

To assign via the dashboard: edit a monitor and open the location picker. Your private location appears under the **Private** group with its current health badge.

To assign via `yorker.config.yaml`, add the location ID to the monitor's `locations` array:

```yaml
monitors:
  - name: Internal API
    type: http
    url: https://api.internal.example.com/health
    locations:
      - loc_us_east          # hosted location
      - loc_abcdef123456     # your private location
```

Deploy the config to apply:

```bash
yorker deploy
```

---

## Network Requirements

The agent only needs outbound connectivity -- no inbound connections are required and no firewall rules need to be opened toward Yorker.

| Requirement | Detail |
|---|---|
| Outbound HTTPS | `app.yorkermonitoring.com:443` -- for polling checks and submitting results |
| Internal network access | The agent must be able to reach the services it monitors (e.g., internal hostnames, VPN resources) |

There is no VPN tunnel from the agent to Yorker's infrastructure. All communication is initiated by the agent over standard HTTPS.

---

## Troubleshooting

| Symptom | Likely cause | Fix |
|---|---|---|
| Location stays Offline | Wrong runner key or location ID | Verify `RUNNER_API_KEY` and `LOCATION_ID` in `.env` match what was generated in Steps 1 and 2 |
| Location stays Offline | Firewall blocking outbound HTTPS | Allow outbound connections from the agent machine to `app.yorkermonitoring.com:443` |
| Location is Active but no checks run | Wrong `TEAM_ID` | The runner polls with team scoping -- verify `TEAM_ID` matches your team in Settings |
| Checks not executing | Agent is connected but no checks assigned | Assign the private location to at least one monitor (Step 5) |
| Image build fails | Missing Docker or build context | Run `docker build` from the repository root with `-f docker/Dockerfile.runner-heavy` |
| Browser checks fail | Using the light image | Switch to `runner-heavy`, which includes Playwright and Chromium |


---

## Install Dashboards

> One-click provisioning of pre-built Yorker synthetic monitoring dashboards into self-hosted HyperDX or ClickStack Cloud.


# Install Dashboards

Yorker ships 4 pre-built dashboards that provision directly into your observability platform via its Dashboard API. No manual chart creation required. Both self-hosted HyperDX and ClickStack Cloud (ClickHouse-managed HyperDX) are supported.

## Dashboards included

| Dashboard | What it shows |
|---|---|
| **[Yorker] Synthetic Monitoring Overview** | Active monitors, uptime %, response time, success/failure trend, recent alert activity |
| **[Yorker] Monitor Deep Dive** | Per-monitor drill-down: percentiles, Web Vitals, HTTP timing breakdown, recent failures with screenshot links |
| **[Yorker] Status Page** | System status, uptime by monitor, daily trend, incident timeline |
| **[Yorker] Trace Correlation** | Trace counts, duration trends, status distribution, recent traces with screenshot links |

## Install via the web UI

1. Go to **Settings > Telemetry (OTLP)**
2. Under **Dashboard Provisioning**, select your deployment type (**Self-hosted** or **ClickStack Cloud**)
3. For ClickStack Cloud: enter your Organization ID, Service ID, and Cloud API key credentials
4. Click **Save Changes**, then click **Install Dashboards**

Self-hosted mode uses the same HyperDX API key already configured for telemetry. ClickStack Cloud uses ClickHouse Cloud API credentials (found in your ClickHouse Cloud console under API Keys).

## Install via the CLI

### Self-hosted HyperDX

```bash
yorker dashboards install --hyperdx-api-key <your-key>
```

### ClickStack Cloud

```bash
yorker dashboards install --mode clickstack_cloud \
  --clickstack-org-id <org-uuid> \
  --clickstack-service-id <service-uuid> \
  --clickstack-api-key-id <key-id> \
  --clickstack-api-secret <key-secret>
```

All ClickStack options support environment variables (`CLICKSTACK_ORG_ID`, `CLICKSTACK_SERVICE_ID`, `CLICKSTACK_API_KEY_ID`, `CLICKSTACK_API_SECRET`).

### Options

| Flag | Description |
|---|---|
| `--mode <mode>` | `self_hosted` (default) or `clickstack_cloud` (env: `HYPERDX_MODE`) |
| `--hyperdx-api-key <key>` | HyperDX API key for self-hosted mode (env: `HYPERDX_API_KEY`) |
| `--hyperdx-url <url>` | Override HyperDX API URL for self-hosted (default: `https://www.hyperdx.io`) |
| `--clickstack-org-id <id>` | ClickHouse Cloud organization ID (env: `CLICKSTACK_ORG_ID`) |
| `--clickstack-service-id <id>` | ClickHouse Cloud service ID (env: `CLICKSTACK_SERVICE_ID`) |
| `--clickstack-api-key-id <id>` | ClickHouse Cloud API key ID (env: `CLICKSTACK_API_KEY_ID`) |
| `--clickstack-api-secret <secret>` | ClickHouse Cloud API key secret (env: `CLICKSTACK_API_SECRET`) |
| `--packs <list>` | Comma-separated subset: `overview`, `deep-dive`, `status-page`, `trace-correlation` |
| `--force` | Skip duplicate check (may create additional copies) |
| `--json` | Output results as JSON |

### Examples

```bash
# Self-hosted: install all dashboards
yorker dashboards install --hyperdx-api-key hdx_abc123

# Self-hosted: install only the overview dashboard
yorker dashboards install --hyperdx-api-key hdx_abc123 --packs overview

# ClickStack Cloud: install all dashboards via env vars
export CLICKSTACK_ORG_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export CLICKSTACK_SERVICE_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export CLICKSTACK_API_KEY_ID=your-key-id
export CLICKSTACK_API_SECRET=your-key-secret
yorker dashboards install --mode clickstack_cloud
```

## Idempotency

Dashboards are named with a `[Yorker]` prefix. The install command checks for existing dashboards by name and skips duplicates automatically. Use `--force` to skip this check (note: this may create additional copies rather than replacing existing ones).

## Screenshots in dashboards

The Deep Dive, Status Page, and Trace Correlation dashboards include `synthetics.screenshot.url` in their search panels. When a browser check captures screenshots, the URL appears as a clickable link directly in the dashboard.


---


# Concepts

## Architecture

> How Yorker's three-tier architecture works — Control Plane, Orchestrator, and Runners.


# Architecture

Yorker uses a three-tier architecture that separates check management from check execution. This design gives you per-execution isolation for browser checks, lets runners operate in 14 global regions, and lets one control plane coordinate alerting, SLOs, and insights across every monitor. Different check types and location types take different telemetry paths to your OTel backend — the [Telemetry flow](#telemetry-flow) section below lays out exactly which data travels which route.

## Three-tier model

### Control Plane

The control plane is a Next.js application deployed on Vercel. It handles:

- **Check definitions** -- creating, updating, and deleting monitors via the Web UI, CLI, or API.
- **User accounts and teams** -- authentication (Clerk), team membership, API key management.
- **Results storage** -- check metadata (pass/fail, response times, Web Vitals) stored in Postgres via Neon.
- **Artifact storage** -- screenshots and debug artifacts stored in Cloudflare R2 (S3-compatible).
- **API** -- RESTful endpoints for all operations, consumed by the dashboard, CLI, and runners.

The control plane never executes checks itself. It defines *what* to monitor, stores check results in Postgres, evaluates alert rules and SLO burn, runs anomaly detection, and generates monitor and team insights. It also hands off OTel events to the orchestrator's outbox for asynchronous delivery — see [Telemetry flow](#telemetry-flow) for the full breakdown of which emissions come from the runner versus the control plane.

### Orchestrator

The orchestrator is an always-on service running on Fly.io. It:

1. Polls Postgres on a schedule to find checks that are due for execution.
2. Dispatches Fly Machines in the correct region for each check.
3. Manages the machine pool lifecycle -- creating, reusing, and destroying machines.

The orchestrator is the bridge between "this check should run every 5 minutes from London" and "spin up a machine in `lhr` right now."

### Runners

Runners execute checks. They run on Fly.io across 14 global regions, close to the infrastructure being monitored. Every runner, regardless of mode, does the same three things for each check:

1. Executes the check (HTTP request, Playwright browser session, or MCP tool exchange).
2. Uploads screenshots directly to Cloudflare R2 (browser checks only).
3. Submits the full check result — timing breakdown, assertions, Web Vitals, certificates, network data — to the Yorker control plane via `POST /api/runner/results` for storage and evaluation.

Some runners *additionally* emit OTLP metrics/traces directly from the runner to your collector — but not all of them do, and which do depends on check type and deployment mode. That split is explained in [Telemetry flow](#telemetry-flow) below.

## Three execution tiers

Different check types have different isolation and resource requirements. Yorker uses three execution tiers:

### Tier 1 -- Ephemeral Heavy (Browser checks)

Each browser check runs in its own Fly Machine with Playwright and Chromium (~1GB image). The machine is created for the check and destroyed after it completes.

This gives you:

- **Full isolation** -- no shared browser state between checks or tenants.
- **Clean environment** -- no cookies, cache, or extensions carrying over.
- **Predictable performance** -- no resource contention from other checks.

### Tier 3 — Per-Tenant Persistent (HTTP and MCP checks)

HTTP and MCP checks run in lightweight Node.js containers (~200MB image). Each customer gets one persistent container per region. The container stays alive and executes checks as they come due.

This gives you:

- **Low latency** — no cold-start overhead for each check.
- **Efficiency** — HTTP and MCP checks are fast and lightweight, so sharing a container is safe.
- **Tenant isolation** — each customer has their own container, so one customer's checks cannot affect another's.

### Tier 2 — Ephemeral Light (Coming soon)

Multi-step API tests will run in ephemeral lightweight containers — isolated like browser checks but without the Chromium overhead.

## Telemetry flow

Yorker's emission model has two constants and one split:

- **Constant 1:** every check run submits its result (status, timing, assertions, Web Vitals, certificates, screenshots, console logs, etc.) to the Yorker control plane via `POST /api/runner/results`. This is an internal submission protocol, not OTLP. It is how alerting, SLOs, insights, the dashboard, and the CLI all work. It happens whether or not you have configured an OTLP endpoint.
- **Constant 2:** once you have configured a team OTLP endpoint under **Settings > Telemetry (OTLP)**, the control plane enqueues OTLP log and span events in an outbox whenever it has something worth telling your collector about — a completed or failed check, an alert state change, an SLO burn, a certificate rotation, a new insight, a deployment marker, a maintenance-window edit. The orchestrator (a separate always-on Fly service) drains that outbox every ~10 s, applies SSRF guards, and POSTs the OTLP payload to your collector. You can see every event type in `apps/web/src/lib/otel-events.ts` and the shipper in `apps/orchestrator/src/outbox-drain.ts`. **If no team OTLP endpoint is configured, the control plane skips the enqueue entirely** — the result still lands in Postgres and still drives alerting and the dashboard, but no OTel events are produced.
- **The split:** **browser checks** *also* emit OTLP metrics and traces **directly from the runner** to your collector (when a team OTLP endpoint is configured; the orchestrator threads it into each browser execution payload). HTTP and MCP checks running on Yorker-hosted locations do **not** emit OTLP from the runner — the OTel signal for those runs is the `synthetics.check.completed` / `synthetics.check.failed` log event the control plane enqueues on your behalf (it carries the same dimensions: response time, status, assertions, timing breakdown). Private-location runners can opt in to runner-direct OTLP for HTTP and MCP checks by setting `OTLP_ENDPOINT`/`OTLP_API_KEY` as environment variables on the runner container at deploy time.

The table below is the short version. **All "outbox → collector" rows assume you have configured a team OTLP endpoint** — without one, the outbox columns are skipped entirely.

| Check type / location | Per-check metrics + traces | `check.completed` log event | Alerts / SLO / insight / cert / deployment events |
|---|---|---|---|
| **Browser**, any location | Runner → collector (direct) | Control plane → outbox → collector | Control plane → outbox → collector |
| **HTTP / MCP**, Yorker-hosted | *(not emitted from runner today)* | Control plane → outbox → collector | Control plane → outbox → collector |
| **HTTP / MCP**, private location | Runner → collector (direct) — only if you set `OTLP_ENDPOINT` on the agent container at startup | Control plane → outbox → collector | Control plane → outbox → collector |

A few things to note:

- **Every path targets the same `otlpEndpoint`.** Whether a signal is runner-direct or outbox-delivered, it lands in the same collector you configured under **Settings > Telemetry (OTLP)**.
- **Every emission is OTLP HTTP JSON.** There is no proprietary ingestion format to learn. Runner-direct emission skips entirely if `otlpEndpoint` is unset (best-effort). The outbox path is retried with exponential backoff by the orchestrator.
- **A runner crash skips both emission paths** for the affected run — the control plane only learns about the attempt if and when the runner submits a result. An orphaned attempt shows up as a gap in your result history, not an OTLP event.

### What this means for compliance and data flow

- **Check results transit the Yorker control plane.** They land in Yorker's Postgres and power the dashboard, alerting, SLOs, and insights. What the control plane sees per run: timing breakdown, assertion pass/fail, HTTP status codes, truncated response bodies (for assertion re-evaluation and display), TLS certificate metadata, and — for browser checks — Web Vitals, network request metadata, console logs, screenshot *references* (the actual image bytes live in R2, uploaded directly by the runner), and step results. Retention depends on plan tier.
- **For browser checks, per-check metrics and traces also reach your collector without traversing Yorker.** If you need low-latency, runner-direct OTel for browsers on hosted locations today, you already have it.
- **For HTTP and MCP checks on Yorker-hosted locations, any OTel signals reaching your collector flow through the Yorker control plane** (and the orchestrator outbox), once you have configured an OTLP endpoint. This is a real architectural trade-off: it is how Yorker can enrich the `check.completed` event with things the runner doesn't know — SLO state, anomaly scores, alert context.
- **If you need full runner-direct OTLP for HTTP and MCP checks**, run a private location and set `OTLP_ENDPOINT`/`OTLP_API_KEY` on the runner container at startup. The agent reads those env vars once and emits OTLP for every check it runs.

### Private locations

Private locations let you run Yorker's runner inside your own network. The runner calls your internal services from inside your VPC (so internal hosts stay private), uploads screenshots directly to R2 (or a local fallback if R2 isn't configured), and POSTs results to the Yorker control plane over outbound HTTPS. That last path is mandatory — alerting, SLOs, and the dashboard all depend on it.

You can opt your private runners into runner-direct OTLP for HTTP and MCP checks by setting `OTLP_ENDPOINT` and `OTLP_API_KEY` as environment variables on the runner container when you start it. Do that with an internal collector and no per-check telemetry ever leaves your network — only the result submission to the control plane egresses. Browser checks on private locations also emit runner-direct OTLP the same way hosted browser checks do. Derived events (alerts, SLO, cert, insight, check.completed) continue to flow via the control plane outbox → orchestrator → your collector.

## Screenshot pipeline

For browser checks, screenshots flow through a dedicated pipeline:

1. **Capture** -- Playwright captures screenshots during script execution (every step, on failure only, or disabled).
2. **Upload** -- The runner uploads screenshots to Cloudflare R2 with a path scoped to the team, check, and run.
3. **Serve** -- The control plane API serves screenshots with team ownership validation. Only members of the team that owns the check can access its screenshots.

Screenshots are stored as R2 artifacts with retention based on your plan tier.

## Data tiering

Check result data is split into tiers based on how often it is accessed and how long it needs to be retained:

| Tier | What it stores | Retention | Storage |
|------|---------------|-----------|---------|
| **Tier A** (checkResults) | Core metrics: pass/fail, response time, status code, Web Vitals | Full plan retention | Postgres |
| **Tier B** (checkResultDetails) | Debug data: network request waterfalls, console logs, DOM snapshots | Shorter retention | Postgres (JSONB) |
| **R2 artifacts** | Screenshots, full network headers | Based on plan tier | Cloudflare R2 |

Tier A data is always stored and drives the dashboard, alerting, and SLO calculations. Tier B data is for debugging failed checks and is retained for a shorter window to manage storage costs.


---

## OpenTelemetry

> How Yorker emits standard OTLP signals and correlates with your existing observability stack.


# OpenTelemetry

Yorker is OTel-native. Every check emits standard OpenTelemetry signals -- metrics and traces -- using the OTLP HTTP JSON protocol. There is no proprietary telemetry format. If your backend speaks OTel, it works with Yorker.

## Why OTel-native matters

Most synthetic monitoring tools store results in a proprietary system. When something breaks, you switch between your monitoring dashboard and your observability platform, manually correlating timestamps and URLs.

Yorker eliminates that context switch. Your synthetic check results land in the same backend as your application traces, logs, and metrics. A failing health check and the 500 error it triggered share the same trace ID.

## Metrics emitted

Runner-direct OTLP emission (today: browser checks on any location, plus HTTP/MCP checks on private locations that have `OTLP_ENDPOINT` set) produces these metrics as OTLP gauge data points:

| Metric | Type | Description |
|--------|------|-------------|
| `synthetics.http.response_time` | Gauge (ms) | Total response time from request start to last byte received. |
| `synthetics.check.success` | Gauge (0 or 1) | Whether the check passed all assertions. |
| `synthetics.dns.lookup_duration` | Gauge (ms) | Time spent resolving DNS. |
| `synthetics.tls.handshake_duration` | Gauge (ms) | Time spent on TLS handshake. |
| `synthetics.browser.lcp` | Gauge (ms) | Largest Contentful Paint (browser checks only). |
| `synthetics.browser.fcp` | Gauge (ms) | First Contentful Paint (browser checks only). |
| `synthetics.browser.cls` | Gauge (score) | Cumulative Layout Shift (browser checks only). |

These metrics follow OpenTelemetry semantic conventions for synthetic monitoring where they exist, and use the `synthetics.*` namespace for domain-specific signals.

Once you have configured an OTLP endpoint under **Settings > Telemetry (OTLP)**, the control plane outbox path produces matching log events (`synthetics.check.completed`, `synthetics.check.failed`) for every check regardless of type or location. Those log event bodies carry the same response time, status, and timing breakdown as the runner-emitted metrics, so HTTP and MCP checks on Yorker-hosted locations still land observable data in your collector — you just query log events instead of gauges for those. Until you configure an endpoint, no outbox events are enqueued at all.

## Resource attributes

Every metric, trace, and log event includes resource attributes that identify the check, location, and run:

| Attribute | Example | Description |
|-----------|---------|-------------|
| `synthetics.check.id` | `chk_abc123` | Unique check identifier. |
| `synthetics.check.name` | `Homepage` | Human-readable check name. |
| `synthetics.check.type` | `http`, `browser`, or `mcp` | Check type. |
| `synthetics.location.id` | `loc_us_east` | Location identifier. |
| `synthetics.location.name` | `US East (Ashburn)` | Human-readable location name. |
| `synthetics.location.type` | `hosted` or `private` | Whether the location is Yorker-hosted or a private location. |
| `synthetics.run.id` | `run_xyz789` | Unique identifier for this specific execution. |
| `url.full` | `https://example.com` | The URL being monitored. |
| `service.name` | `synthetics` | Service name used by both runner-direct emissions and control-plane outbox events. |

These attributes let you filter, group, and alert on synthetic check data in your observability backend the same way you would with any other OTel-instrumented service.

### Labels as resource attributes

Any labels attached to a check are emitted as additional resource attributes on every metric and trace. This lets you slice telemetry by your own dimensions — environment, service, team, criticality — without having to map check IDs back to metadata in your observability backend.

| Label format | Resource attribute |
|---|---|
| `env:production` | `yorker.label.env="production"` |
| `service:payments` | `yorker.label.service="payments"` |
| `critical` *(no colon)* | `yorker.label.critical="true"` |

See [Create a Monitor → Labels](/docs/guides/create-monitor#labels) for how to attach labels.

## Trace correlation

Yorker injects a W3C `traceparent` header into outbound requests during check execution. This is how it works:

1. The runner generates a trace ID for the check execution.
2. The `traceparent` header is added to the HTTP request (or injected into the browser's network requests for browser checks).
3. Your backend application picks up the trace context via its own OTel instrumentation.
4. The synthetic check span and your backend request span share the same distributed trace.

The result: when a check fails, you can click from the Yorker alert directly to the distributed trace in your observability backend. You see the synthetic request, the backend handler, the database query, and the error -- in one view.

## Backend compatibility

Yorker works with any OTel-compatible backend. All Yorker telemetry is emitted as OTLP HTTP JSON — the most widely supported OTel transport. Tested backends include:

- **ClickStack** (ClickHouse + HyperDX)
- **Grafana Cloud** (Tempo + Mimir)
- **Datadog**
- **Honeycomb**
- **New Relic**
- **Jaeger**
- **Any OTLP-compatible collector** (OpenTelemetry Collector, Alloy, Vector)

## Emission model

Yorker has **two OTel emitters**, and they both target the same `otlpEndpoint` you configure on your team:

- **Runners** emit OTLP metrics, traces, and logs directly to your collector — but only for browser checks today. Hosted HTTP and MCP runners do not emit OTLP from the runner process; private-location operators can enable runner-direct emission by setting `OTLP_ENDPOINT`/`OTLP_API_KEY` on their runner container when they start it.
- **The orchestrator** drains an emission outbox that the control plane writes to, and ships every OTel log/span event Yorker generates to your collector — including `synthetics.check.completed`, `synthetics.check.failed`, `synthetics.step.completed`, alert state changes, SLO burn warnings, TLS certificate events, monitor/team insights, deployment markers, and maintenance-window events. The control plane enqueues; the orchestrator polls the outbox every ~10 seconds, runs SSRF guards, and POSTs.

The metrics catalogued above (`synthetics.http.response_time`, `synthetics.check.success`, and friends) are currently produced by the runner-direct path, which means you will see them for browser checks. For HTTP and MCP checks on hosted locations, the same information reaches your collector via the `synthetics.check.completed` / `check.failed` log events — they carry `responseTimeMs`, `status`, assertion results, timing breakdown, and the same resource attributes as the metrics, so dashboards and queries can key off either signal.

See the [Telemetry flow section in Architecture](/docs/concepts/architecture#telemetry-flow) for the full table of which check type and location combinations take which path.

## Setup

To configure OTel emission for your team:

1. Go to **Settings > Telemetry (OTLP)** in the Yorker dashboard.
2. Enter your OTLP endpoint URL (e.g., `https://otel-collector.example.com:4318`).
3. Add any required authentication headers (API key, bearer token).
4. Click **Test Connection** — Yorker's control plane dispatches a test payload and reports success or failure.
5. Save.

From this point, the control plane starts enqueueing events for the orchestrator to ship (for every check type and every location), and browser-check runners start including the endpoint in each execution payload for runner-direct metric/trace emission. Team-level OTLP credentials are stored on the team, not per-check.


---

## Alert Correlation

> How multi-location correlation and OTel trace linking reduce noise and speed up root cause analysis.


# Alert Correlation

Synthetic monitors generate a lot of signals. Not every failure is a real outage -- network glitches, regional ISP issues, and transient errors produce false positives. Yorker uses multi-location correlation and consecutive failure thresholds to separate real incidents from noise, and OTel trace linking to get you from alert to root cause in one click.

## The noise problem

A single-location failure usually means nothing. A DNS resolver in Frankfurt hiccups for 200ms. A CDN edge node in Sydney drops a connection. If you alert on every individual failure, you get paged for problems your users never notice.

The question is not "did one check fail?" but "is the service actually down?"

## Multi-location correlation

The `multi_location_failure` condition answers that question. It requires N of M monitoring locations to report failure within a time window before triggering an alert.

For example, if your check runs from 6 locations and you configure `minLocations: 3`, the alert only fires when at least 3 locations fail in the same window. A single location flaking does not page you.

```yaml
alerts:
  - name: Homepage Down
    conditions:
      - type: multi_location_failure
        minLocations: 3
    channels:
      - "@pagerduty-oncall"
```

This eliminates geographic noise. If only Tokyo fails but Ashburn, London, Frankfurt, Singapore, and Sydney are all passing, the problem is regional -- not an outage.

## Consecutive failure thresholds

The `consecutive_failures` condition handles a different class of noise: transient blips. A single timeout or 503 that resolves on the next check interval is not worth alerting on.

```yaml
alerts:
  - name: API Degraded
    conditions:
      - type: consecutive_failures
        count: 5
    channels:
      - "@ops-slack"
```

This alert only fires after 5 checks in a row fail. A one-off timeout is silently recorded in the check history but does not trigger a notification.

## Multi-tier alerting

Combine both conditions to build alert tiers that match your incident response workflow:

```yaml
alerts:
  # Critical: multiple locations confirm the outage
  - name: Service Outage
    conditions:
      - type: multi_location_failure
        minLocations: 3
    channels:
      - "@pagerduty-oncall"

  # Warning: persistent failures from any location
  - name: Service Degraded
    conditions:
      - type: consecutive_failures
        count: 5
    channels:
      - "@ops-slack"

  # Info: SSL certificate expiring soon
  - name: SSL Expiry Warning
    conditions:
      - type: ssl_expiry
        daysBeforeExpiry: 14
    channels:
      - "@on-call-email"
```

Critical alerts go to PagerDuty because multiple locations confirm the service is down. Warning alerts go to Slack because the issue is persistent but might be localized. Info alerts go to email for non-urgent action items.

## OTel trace linking

When a check fails, the trace ID from that execution links directly to the distributed trace in your observability backend. The flow looks like this:

1. Runner executes the check and injects a `traceparent` header.
2. Your backend processes the request and records the trace.
3. The check fails (assertion failure, timeout, 5xx response).
4. Yorker creates an alert with the trace ID attached.
5. You click the trace link in the alert notification.
6. Your observability backend shows the full distributed trace: the synthetic request, your API handler, the database query that timed out, the error.

This collapses the "what broke?" investigation from minutes of log searching to a single click. The synthetic check and the backend error are part of the same trace.

The alerts dashboard shows all active, acknowledged, and recovered alerts across your monitors.

## Alert lifecycle

Alerts follow a state machine:

| State | Meaning |
|-------|---------|
| **ACTIVE** | The alert condition is met. Notifications have been sent. |
| **ACKNOWLEDGED** | A team member has acknowledged the alert. No repeat notifications. |
| **RESOLVED** | A team member manually resolved the alert. |
| **RECOVERED** | The check started passing again. The alert auto-resolves. |

When a check that triggered an ACTIVE alert starts passing again, the alert transitions to RECOVERED and a recovery notification is sent to the same channels. This closes the loop without manual intervention.

Acknowledged alerts suppress repeat notifications but remain visible in the dashboard until the underlying issue is resolved or the check recovers.


---

## Incidents

> How Yorker groups correlated alerts into incidents, tracks their lifecycle, and dispatches opinionated, investigator-grade notifications.


# Incidents

A Yorker incident is a correlated group of alerts treated as one investigable unit. Each incident has a fingerprint, a severity, a lifecycle, and a notification policy. Incidents reduce noise by collapsing many alerts into one ticket and by emitting structured, investigator-grade payloads to your channels.

## Why incidents exist

A single alert answers "is this check failing right now?" It does not answer the question an on-call engineer actually needs: **what is the blast radius, and is it related to something else that's breaking?**

Synthetic monitors often fire in bursts. An upstream DNS provider hiccups and ten HTTP checks page at once. A CDN edge degrades and browser checks across three regions turn red. Without correlation, you get ten pages for one problem.

Yorker groups those alerts into an incident, computes a scoped hypothesis from the observations (HTTP status codes, locations, shared failing domains, symptom timing), and dispatches **one** ticket per channel per incident — not one per alert.

## The incident lifecycle

Every incident moves through a small set of states. Each state transition is recorded as a first-class event and dispatched to subscribed channels.

| State           | Entered by                                                         |
| --------------- | ------------------------------------------------------------------ |
| `open`          | Correlated alerts above the score threshold                        |
| `acknowledged`  | A user clicks "Acknowledge" in the dashboard or API                |
| `auto_resolved` | All member alerts recovered and the 15-minute cool-down elapsed    |
| `closed`        | A user closes the incident explicitly                              |
| `reopened`      | A user reopens a previously closed/resolved incident               |

The transient states `reopened` → `open` are preserved in the event log so downstream consumers can replay the exact sequence.

## Event types

Every lifecycle transition emits one of these events. Every event carries the full observations + hypothesis snapshot so a consumer replaying one event has complete context without querying back.

- `opened` — new incident created
- `alert_attached` — an additional alert joined an active incident
- `severity_changed` — severity escalated or de-escalated
- `acknowledged` — a user took ownership
- `note_added` — a user added a freeform note
- `auto_resolved` — all members recovered and cool-down elapsed
- `closed` — a user closed it
- `reopened` — a user reopened a previously resolved incident

Each event is persisted to `incident_events`, emitted as an OTel log record (if an OTLP endpoint is configured for the team), and dispatched to every channel subscribed to incidents for the team.

## Default notification routing

Different channel types have different sensible defaults. Yorker opts into the minimum-noise routing that matches each channel's audience:

| Channel      | Receives by default                                                                  |
| ------------ | ------------------------------------------------------------------------------------ |
| Slack        | Every lifecycle event (timeline-style thread)                                        |
| Email        | `opened`, `auto_resolved`, `closed` only (inboxes should not be a running timeline)  |
| Webhook      | Every lifecycle event                                                                |
| PagerDuty    | `opened`, `acknowledged`, `auto_resolved`, `closed`, `reopened`, `note_added`        |
| ServiceNow   | `opened`, `severity_changed`, `acknowledged`, `auto_resolved`, `closed`, `note_added` |

PagerDuty skips `severity_changed` because the Events API v2 has no matching action. ServiceNow skips `reopened` because Yorker's reopen semantics don't map cleanly to ServiceNow's reopen concept — a Yorker "reopen" after a recurrence creates a new external ticket rather than mutating the old one.

See the [Slack](/docs/integrations/slack), [PagerDuty](/docs/integrations/pagerduty), [ServiceNow](/docs/integrations/servicenow), [Email](/docs/integrations/email), and [Webhook](/docs/integrations/webhook) integration pages for the exact payload shapes.

## Scoped hypothesis

Every outbound incident payload carries a `hypothesis` block that tells the reader what Yorker thinks is going on — scoped to what an external synthetic sensor can prove:

```json
{
  "hypothesis": {
    "summary": "Stripe API is returning 503/504; checkout is blocked.",
    "confidence": 0.75,
    "ruledIn": ["shared_failing_domain=api.stripe.com"],
    "ruledOut": [
      "DNS resolution: NXDOMAIN not observed",
      "TLS: handshake completes"
    ],
    "correlationDimensionsMatched": ["shared_failing_domain", "error_pattern"],
    "scope": "external_symptoms_only"
  }
}
```

`scope: external_symptoms_only` is the honesty baseline. Yorker can prove the external symptom — users cannot reach checkout — and can rule out classes of causes it directly measured (DNS, TLS, shared failing domains). It cannot see your backend logs, so it never claims the backend is the culprit.

## Dedupe + rate limiting

- **30s dedupe window** — a retry firing the same event to the same channel within 30 seconds is recorded as `skipped_dedupe` in `incident_notification_dispatches`, not sent again.
- **1-per-minute note rate limit** — per (channel, incident), a second `note_added` within 60 seconds of a prior send attempt (successful **or** failed) is recorded as `skipped_rate_limit`. Failed attempts count because each one still hit the upstream endpoint — a flaky webhook returning 5xx must not leak a retry burst past the cap. Prevents an operator running a backfill script from spamming hundreds of notes.

Both checks fail **open** on database errors — losing a notification is worse than double-sending one.

## User-editable templates

Every channel's default payload can be overridden with a Handlebars template attached to the notification channel. The rendering context matches `serializeIncidentEventForExport` plus a few helpers (`severityEmoji`, `eventEmoji`, `join`, `ifHasSource`, `jsonBody`).

A render error or JSON-parse failure on the override **falls back to the default** and logs — a bad template never fails dispatch.

### In the web UI

For Slack, email, and webhook channels, **Settings > Notification Channels > Templates** opens a full editor with per-event tabs, a live preview rendered against canonical fixtures, a library of starter and example templates, a diff view comparing the draft against the last saved version, and a **Send test** button that dispatches the current saved template to the real channel. The editor is the recommended authoring path for these three channel types. PagerDuty and ServiceNow overrides are currently API-only.

### Via the API

Template overrides are sent via the notification-channel API:

```bash
curl -X PUT https://yorkermonitoring.com/api/notification-channels/nch_abc \
  -H "Authorization: Bearer $YORKER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "incidentTemplate": {
      "channelType": "slack",
      "overrides": {
        "opened": {
          "blocks": "{\"blocks\":[{\"type\":\"section\",\"text\":{\"type\":\"mrkdwn\",\"text\":\"{{severityEmoji incident.severity}} {{incident.title}}\"}}]}"
        }
      }
    }
  }'
```

To disable a channel from receiving incident events (fall back to legacy per-alert dispatch), set `incidentSubscribed: false` on the channel.

## Audit trail

Every dispatch writes one row to `incident_notification_dispatches` with status `sent`, `skipped_dedupe`, `skipped_rate_limit`, `skipped_not_routed`, or `failed`, plus any channel-specific response payload (PagerDuty `dedup_key`, ServiceNow `sys_id`). This is the source of truth for "did we actually notify?" — the UI will expose it in a later iteration.


---

## Locations

> Available monitoring locations — 14 hosted regions on Fly.io and private locations.


# Locations

Yorker runs checks from 14 hosted locations distributed across six continents. You can also run checks from private locations inside your own infrastructure.

## Hosted locations

Every hosted location runs on Fly.io infrastructure in the corresponding region. Checks execute close to your users, giving you realistic latency measurements from each geography.

| Location ID | Display Name | Fly Region |
|---|---|---|
| `loc_us_east` | US East (Ashburn) | `iad` |
| `loc_us_south` | US South (Dallas) | `dfw` |
| `loc_us_west` | US West (Los Angeles) | `lax` |
| `loc_na_north` | Canada (Toronto) | `yyz` |
| `loc_sa_east` | South America (São Paulo) | `gru` |
| `loc_eu_west` | Europe West (London) | `lhr` |
| `loc_eu_west_2` | Europe West (Paris) | `cdg` |
| `loc_eu_central` | Europe Central (Frankfurt) | `fra` |
| `loc_eu_north` | Europe North (Stockholm) | `arn` |
| `loc_ap_southeast` | Asia Pacific (Singapore) | `sin` |
| `loc_ap_northeast` | Asia Pacific (Tokyo) | `nrt` |
| `loc_ap_south` | Asia Pacific (Mumbai) | `bom` |
| `loc_ap_oceania` | Oceania (Sydney) | `syd` |
| `loc_af_south` | Africa (Johannesburg) | `jnb` |

## Using locations in config

Specify locations as an array of location IDs in your monitor configuration:

```yaml
monitors:
  - name: Homepage
    type: http
    url: https://example.com
    locations:
      - loc_us_east
      - loc_eu_central
      - loc_ap_northeast
```

When a check runs, it executes independently from each configured location. A check with 3 locations and a 5-minute frequency produces 3 results every 5 minutes — one per location.

You can also set default locations for all monitors:

```yaml
defaults:
  locations:
    - loc_us_east
    - loc_eu_central

monitors:
  - name: Homepage
    type: http
    url: https://example.com
    # Inherits loc_us_east and loc_eu_central from defaults

  - name: APAC Health
    type: http
    url: https://example.com
    locations:
      - loc_ap_southeast
      - loc_ap_northeast
    # Overrides defaults with APAC locations
```

## Plan limits

The number of locations you can use depends on your plan:

| Plan | Hosted Locations | Private Locations |
|------|-----------------|-------------------|
| **Free** | 1 | 0 |
| **Yorker** ($29.99/mo) | All 14 | Up to 2 |
| **Enterprise** | All 14 | Unlimited |

Paid and enterprise plans have access to all 14 hosted locations. The free tier is limited to 1 location per check. See [Pricing](https://yorkermonitoring.com/pricing) for current plan details.

## Private locations

Private locations let you run Yorker's runner inside your own infrastructure — behind a firewall, in a VPC, or on-premises. This is useful for monitoring internal services that are not publicly accessible.

### How they work

1. You deploy the Yorker runner container in your infrastructure (Docker, Kubernetes, or a VM), typically using the Docker snippet that `yorker locations create` prints for you.
2. The runner polls the Yorker control plane over outbound HTTPS for check assignments. No inbound ports needed.
3. The runner executes checks against your internal services — HTTP requests, browser sessions, and MCP tool calls all originate from inside your network, so internal hosts stay private.
4. The runner submits each check result (timing, assertions, Web Vitals, certificates, and screenshot references for browser checks) to the Yorker control plane via `POST /api/runner/results`. This drives alerts, SLOs, insights, and the dashboard — it is mandatory.
5. For browser checks the runner also emits OTLP metrics and traces directly to your collector. For HTTP and MCP checks, runner-direct OTLP emission is opt-in: set `OTLP_ENDPOINT` and `OTLP_API_KEY` as environment variables on the runner container at startup and the persistent-http agent will emit OTLP for every check it runs.
6. The control plane evaluates the result (alerts, SLO burn, baselines, insights). If a team OTLP endpoint is configured, it also enqueues OTel events — `synthetics.check.completed`, alert state changes, SLO warnings, certificate events, insights — in an outbox that the orchestrator polls and ships to your collector asynchronously. If no OTLP endpoint is configured, the enqueue is skipped entirely (results still power the dashboard and alerting, there are just no outbound OTel events).

### Key properties

- **Outbound-only** — The runner initiates all connections. No firewall rules, no VPN tunnels, no inbound ports.
- **Internal targets stay private** — The runner calls your internal services from inside your network, so you never have to expose a staging API or intranet dashboard on the public internet to monitor it.
- **You can keep per-check OTLP inside your network** — Point `OTLP_ENDPOINT` on the runner container at an internal collector and the per-check metrics/traces/logs emitted by your private agent never leave your infrastructure. Check results still flow to the Yorker control plane over outbound HTTPS; derived events (alerts, SLO, insights, `check.completed`) still flow through the orchestrator outbox on the way to your collector.
- **Same runner image** — Private locations use the same runner container as hosted locations. No separate software to maintain.
- **Discounted runs** — Private browser runs are billed at 50% of the hosted rate.

### Setting one up

See the [Private Locations guide](/docs/guides/private-locations) for a complete step-by-step walkthrough (create the location, generate a runner key, deploy the agent, verify health, and assign the location to monitors).

You can drive the same flow from the CLI:

```bash
# Create a private location (also auto-provisions an initial runner key and prints a Docker snippet)
yorker locations create --name hq-data-center --display-name "HQ Data Center"

# List private locations only (default)
yorker locations list

# Include hosted locations in the list too
yorker locations list --all

# Create another runner key for an existing location (secret is shown once)
yorker locations keys create loc_abcdef123456 --name "eu-west-1"

# List runner keys on a location (prefixes only)
yorker locations keys list loc_abcdef123456

# Revoke a key by its ID
yorker locations keys revoke loc_abcdef123456 rkey_abc123

# Delete a private location (also revokes all its runner keys)
yorker locations delete loc_abcdef123456
```

Once the runner is online, reference the private location ID in any monitor:

```yaml
monitors:
  - name: Internal API
    type: http
    url: https://internal-api.staging.local/health
    locations:
      - loc_abcdef123456
      - loc_us_east  # Can mix private and hosted locations
```


---


# Integrations

## Slack

> Send incident notifications to Slack via Block Kit — one timeline-style thread per incident, every lifecycle event.


# Slack

Yorker posts incident notifications to Slack via an [Incoming Webhook](https://api.slack.com/messaging/webhooks). Slack is the **timeline** channel — by default it receives every lifecycle event, so your channel becomes a running record of the incident.

For the underlying model (lifecycle states, event types, scoped hypothesis), see [Incidents](/docs/concepts/incidents).

## Set up

1. In Slack, create an Incoming Webhook and copy the URL.
2. In Yorker, go to **Settings > Notification Channels**, click **Create Channel**, pick **Slack**, and paste the webhook URL.
3. The channel is subscribed to incidents by default. Wire it to any alert rule and it will participate in incident dispatch.

Or via the API:

```bash
curl -X POST https://yorkermonitoring.com/api/notification-channels \
  -H "Authorization: Bearer $YORKER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "ops-channel",
    "channel": {
      "type": "slack",
      "webhookUrl": "https://hooks.slack.com/services/T.../B.../..."
    }
  }'
```

## What gets posted

Slack receives **every** incident event by default:

- `opened`
- `alert_attached`
- `severity_changed`
- `acknowledged`
- `auto_resolved`
- `closed`
- `reopened`
- `note_added`

The `opened` event uses a rich Block Kit layout — header, severity, affected checks, locations, symptom window, shared failing domains, hypothesis, ruled-out list, and a **View in Yorker** button. Subsequent events are single-section status lines so the thread reads like a timeline.

Example `opened` payload:

```json
{
  "blocks": [
    { "type": "header", "text": { "type": "plain_text", "text": "🔴 Incident opened — Checkout API outage" } },
    { "type": "section", "text": { "type": "mrkdwn", "text": "*Severity*: `CRITICAL` · *Incident*: <https://yorkermonitoring.com/dashboard/incidents/inc_abc|inc_abc>" } },
    { "type": "section", "fields": [
      { "type": "mrkdwn", "text": "*Affected checks*\nCheckout API" },
      { "type": "mrkdwn", "text": "*Locations*\nloc_us_east_1, loc_eu_west_1" },
      { "type": "mrkdwn", "text": "*Symptom window*\n2026-04-15T09:58:00Z → ongoing" },
      { "type": "mrkdwn", "text": "*Shared failing domains*\napi.stripe.com" }
    ]},
    { "type": "section", "text": { "type": "mrkdwn", "text": "*Hypothesis*\nStripe API is returning 503/504; checkout is blocked." } },
    { "type": "section", "text": { "type": "mrkdwn", "text": "*Ruled out*\n• DNS resolution: NXDOMAIN not observed\n• TLS: handshake completes" } },
    { "type": "context", "elements": [{ "type": "mrkdwn", "text": "Scope: `external_symptoms_only` — Yorker measures external symptoms only" }] },
    { "type": "actions", "elements": [{ "type": "button", "text": { "type": "plain_text", "text": "View in Yorker" }, "url": "https://yorkermonitoring.com/dashboard/incidents/inc_abc" }] }
  ]
}
```

## Template overrides

Every event's default payload can be replaced with a Handlebars-rendered Block Kit JSON string. The template renders against the full incident event context.

### Edit in the web UI

The fastest way to author overrides is the per-channel template editor. Open **Settings > Notification Channels**, click **Templates** next to the Slack channel, and you get:

- A Handlebars editor with JSON syntax highlighting
- A live **preview** pane that renders your template against one of six canonical fixtures (single HTTP failure, multi-location burst, browser check, MCP, etc.)
- A per-event drop-down so you can author one template per lifecycle event
- A **library** sidebar with curated starter blocks and end-to-end examples (runbook-style, terse-ack, etc.) that apply with a click
- **Compare with saved** diff view before you save
- **Send test**: dispatches the current saved template to the real Slack webhook using the selected fixture (60-second cooldown per channel)
- **Reset to default** per field: clears the override and falls back to the shipping payload

Saved overrides apply immediately to the next incident dispatch for that channel.

### Edit via the API

```bash
curl -X PUT https://yorkermonitoring.com/api/notification-channels/nch_abc \
  -H "Authorization: Bearer $YORKER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "incidentTemplate": {
      "channelType": "slack",
      "overrides": {
        "opened": {
          "blocks": "{\"blocks\":[{\"type\":\"section\",\"text\":{\"type\":\"mrkdwn\",\"text\":\"{{severityEmoji incident.severity}} *{{incident.title}}*\\n{{payload.hypothesis.summary}}\"}}]}"
        }
      }
    }
  }'
```

The `blocks` body must render to a JSON object with a `blocks: [...]` array. A render error or a parse error falls back to the default payload and logs a warning. A bad template never fails dispatch. Each body is capped at 32 KiB.

Use `"default"` as the event key to define a single override that applies to every event type that doesn't have its own entry.

### Available helpers

- `{{severityEmoji incident.severity}}` → `🔴 / 🟡 / 🔵`
- `{{eventEmoji eventType}}` → `🚨 / ➕ / 🔺 / 👤 / ✅ / ☑️ / 🔁 / 📝`
- `{{upperCase str}}`, `{{titleCase str}}`
- `{{join array ", "}}`
- `{{#ifHasSource "synthetic_http"}}…{{/ifHasSource}}` — `synthetic_http`, `synthetic_browser`, or `synthetic_mcp`
- `{{jsonBody payload}}` — splat a value as raw JSON (already `JSON.stringify`d). In JSON-producing channels (Slack, webhook, PagerDuty, ServiceNow) escaping is disabled, so `{{jsonBody x}}` and `{{{jsonBody x}}}` are equivalent. In email HTML templates the double-stash form is HTML-escaped by default — use triple-stash only as an explicit opt-out.

### Render context

The full context mirrors `serializeIncidentEventForExport`:

- `eventId`, `eventType`, `incidentId`, `teamId`, `occurredAt`
- `actor` — `{ type: "user" | "system", id }`
- `payload` — the full event payload (observations, hypothesis, event-specific fields)
- `incident` — `{ incidentId, title, severity, state, openedAt, triageUrl }`

## Disabling incident routing

To fall back to the legacy per-alert Slack dispatch, set `incidentSubscribed: false` on the channel:

```bash
curl -X PUT https://yorkermonitoring.com/api/notification-channels/nch_abc \
  -H "Authorization: Bearer $YORKER_API_KEY" \
  -d '{ "incidentSubscribed": false }'
```


---

## PagerDuty

> Trigger PagerDuty incidents via Events API v2 with dedup_key = Yorker incident ID, observations and scoped hypothesis in custom_details.


# PagerDuty

Yorker drives PagerDuty through the [Events API v2](https://developer.pagerduty.com/docs/events-api-v2-overview). One Yorker incident maps to one PagerDuty alert; the `dedup_key` is the Yorker `incident_id`. Subsequent lifecycle events update the same PagerDuty alert rather than creating new ones.

For the underlying model (lifecycle states, event types, scoped hypothesis), see [Incidents](/docs/concepts/incidents).

## Set up

1. In PagerDuty, add an **Events API v2** integration to the service that should receive Yorker alerts. Copy the **Integration Key** (also called the routing key).
2. Note your PagerDuty region: `us` for `events.pagerduty.com`, `eu` for `events.eu.pagerduty.com`.
3. In Yorker, create a notification channel of type `pagerduty` with the routing key and region:

```bash
curl -X POST https://yorkermonitoring.com/api/notification-channels \
  -H "Authorization: Bearer $YORKER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "pd-oncall",
    "channel": {
      "type": "pagerduty",
      "routingKey": "R0UTINGKEY1234567890ABCDEF",
      "serviceRegion": "us"
    }
  }'
```

| Field            | Required | Default | Description                               |
| ---------------- | -------- | ------- | ----------------------------------------- |
| `routingKey`     | yes      | —       | Events API v2 integration key             |
| `serviceRegion`  | no       | `us`    | `us` or `eu`                              |

## Event mapping

| Yorker event      | PD `event_action` | Notes                                              |
| ----------------- | ----------------- | -------------------------------------------------- |
| `opened`          | `trigger`         | First alert created                                |
| `acknowledged`    | `acknowledge`     | Forwards the Yorker ack to PD                      |
| `auto_resolved`   | `resolve`         | After cool-down elapses                            |
| `closed`          | `resolve`         | User-initiated close                               |
| `reopened`        | `trigger`         | Recurrence re-triggers the alert                   |
| `note_added`      | `trigger`         | PD treats duplicate dedup_key triggers as updates  |
| `severity_changed`| *(skipped)*       | Events API v2 has no matching action               |
| `alert_attached`  | *(skipped)*       | Internal-only                                      |

`dedup_key` is always the Yorker `incident_id`. `severity` maps `critical → critical`, `warning → warning`, `info → info`.

Example `opened` payload:

```json
{
  "routing_key": "R0UTINGKEY…",
  "event_action": "trigger",
  "dedup_key": "inc_abc",
  "payload": {
    "summary": "[Yorker] Checkout API outage",
    "source": "yorker",
    "severity": "critical",
    "class": "synthetic-monitoring",
    "custom_details": {
      "incident_id": "inc_abc",
      "event_type": "opened",
      "event_id": "ievt_001",
      "sources": ["synthetic_http"],
      "affected_checks": ["Checkout API"],
      "locations_affected": ["loc_us_east_1", "loc_eu_west_1"],
      "shared_failing_domains": ["api.stripe.com"],
      "hypothesis": {
        "summary": "Stripe API is returning 503/504; checkout is blocked.",
        "confidence": 0.75,
        "ruled_in": ["shared_failing_domain=api.stripe.com"],
        "ruled_out": ["DNS resolution: NXDOMAIN not observed", "TLS: handshake completes"],
        "scope": "external_symptoms_only",
        "correlation_dimensions_matched": ["shared_failing_domain", "error_pattern"]
      },
      "actor": { "type": "system", "id": null }
    }
  },
  "links": [
    { "href": "https://yorkermonitoring.com/dashboard/incidents/inc_abc", "text": "View in Yorker" }
  ]
}
```

When the incident is a recurrence of prior closed incidents, Yorker adds up to 5 prior incident links to the `links` array so the on-call engineer can pivot to history without leaving PD.

## Template overrides

Only the `custom_details` block is user-editable. The envelope (routing key, event action, dedup key, severity, source, summary) is fixed so dedupe and severity semantics stay consistent across your tenants.

> **UI editor note:** the in-app per-channel template editor (at **Settings > Notification Channels > Templates**) currently supports Slack, email, and webhook channels only. PagerDuty `customDetails` overrides are authored through the API below; a UI editor for PagerDuty is planned for a future release.

```bash
curl -X PUT https://yorkermonitoring.com/api/notification-channels/nch_abc \
  -H "Authorization: Bearer $YORKER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "incidentTemplate": {
      "channelType": "pagerduty",
      "overrides": {
        "opened": {
          "customDetails": "{\"runbook\":\"https://wiki.acme.com/runbooks/checkout\",\"yorker_link\":\"{{incident.triageUrl}}\",\"severity\":\"{{incident.severity}}\",\"hypothesis\":\"{{payload.hypothesis.summary}}\"}"
        }
      }
    }
  }'
```

The rendered string must parse as a JSON object. On render failure or invalid JSON the default custom_details block is used instead; dispatch does not fail.

Helper and render context are the same as the [Slack integration](/docs/integrations/slack).

## Ack propagation

Acknowledging in Yorker sends `event_action: "acknowledge"` to PagerDuty. Acknowledging in PagerDuty does **not** propagate back to Yorker in this release — bidirectional sync is tracked as a future enhancement.

## Disabling

The PagerDuty channel type is **incident-pipeline-only** — it has no legacy per-alert dispatch path. `incidentSubscribed` is locked to `true` for PagerDuty channels; the API rejects create/update requests that set it to `false`. To stop routing incidents to a PagerDuty channel, delete it or remove it from the alert rule's channel list.


---

## ServiceNow

> Create and update ServiceNow incident records via inbound REST, keyed by correlation_id = Yorker incident ID.


# ServiceNow

Yorker integrates with ServiceNow via inbound REST against the standard `incident` table. The first `opened` event creates a new incident record; subsequent lifecycle events update that record via `correlation_id` lookup — no `sys_id` round-trip required.

For the underlying model (lifecycle states, event types, scoped hypothesis), see [Incidents](/docs/concepts/incidents).

## Set up

Yorker needs an inbound REST credential with permission to create and update `incident` records.

1. In ServiceNow, create a service account (or re-use an existing integration user) with the **Incident** table permissions needed to `POST` and `PATCH`.
2. Note your instance URL (e.g. `https://acme.service-now.com`) and optionally the assignment group new incidents should route to.
3. Create the channel in Yorker:

```bash
curl -X POST https://yorkermonitoring.com/api/notification-channels \
  -H "Authorization: Bearer $YORKER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "snow-prod",
    "channel": {
      "type": "servicenow",
      "instanceUrl": "https://acme.service-now.com",
      "username": "yorker_integration",
      "password": "…",
      "assignmentGroup": "Network Operations"
    }
  }'
```

| Field              | Required | Description                                                           |
| ------------------ | -------- | --------------------------------------------------------------------- |
| `instanceUrl`      | yes      | Base URL of the ServiceNow instance                                   |
| `username`         | yes      | Integration user                                                      |
| `password`         | yes      | Password or OAuth token (stored at rest; treated as secret)           |
| `assignmentGroup`  | no       | Group name — maps to `assignment_group` on new incidents              |

Authentication is HTTP Basic over TLS.

## Event mapping

| Yorker event      | Action              | Notes                                                          |
| ----------------- | ------------------- | -------------------------------------------------------------- |
| `opened`          | `POST /incident`    | Creates record with `correlation_id = incident_id`             |
| `severity_changed`| `PATCH` record      | Updates `impact`, `urgency`, adds `work_notes`                 |
| `acknowledged`    | `PATCH` record      | Sets `state = 2` (Work in Progress), adds ack note             |
| `note_added`      | `PATCH` record      | Appends to `work_notes`                                         |
| `auto_resolved`   | `PATCH` record      | Sets `state = 6` (Resolved), fills `close_code` + `close_notes` |
| `closed`          | `PATCH` record      | Sets `state = 6` (Resolved), uses Yorker `closeReason`          |
| `reopened`        | *(skipped)*         | Yorker reopens create a new external ticket, not mutate the old|
| `alert_attached`  | *(skipped)*         | Internal-only                                                  |

Updates target `PATCH /api/now/table/incident?sysparm_query=correlation_id=<incident_id>` — no `sys_id` is stored.

### Field mapping (on `opened`)

| ServiceNow field       | Source                                                          |
| ---------------------- | --------------------------------------------------------------- |
| `short_description`    | `payload.hypothesis.summary` (stripped of CR/LF, capped at 160) |
| `description`          | Hypothesis summary + observations summary + triage URL          |
| `work_notes`           | `scope: external_symptoms_only` + ruled-out list                 |
| `impact`, `urgency`    | `critical → 1`, `warning → 2`, `info → 3`                       |
| `correlation_id`       | Yorker `incident_id`                                            |
| `correlation_display`  | `"Yorker Incident"`                                             |
| `assignment_group`     | Channel config (if set)                                         |

### Severity mapping

| Yorker severity | ServiceNow `impact` | ServiceNow `urgency` |
| --------------- | ------------------- | -------------------- |
| `critical`      | 1 (High)            | 1 (High)             |
| `warning`       | 2 (Medium)          | 2 (Medium)           |
| `info`          | 3 (Low)             | 3 (Low)              |

## Why reopened isn't routed

ServiceNow's "Reopen" action reverts a closed incident to an active state on the **same** record. Yorker's recurrence model treats a new burst of correlated alerts after auto-resolution as a **new incident** with a pointer back to the prior one (`recurrenceOf`). Those two semantics don't map cleanly, so Yorker creates a new ServiceNow record on recurrence (via the next `opened` event) rather than mutating the closed one. Use the **View in Yorker** link on each record to pivot between related incidents.

## Template overrides

Three fields are user-editable: `short_description`, `description`, and `work_notes`. The envelope (`state`, `impact`, `urgency`, `correlation_id`, `assignment_group`) stays under Yorker's control so the lookup pattern and severity mapping remain consistent.

> **UI editor note:** the in-app per-channel template editor (at **Settings > Notification Channels > Templates**) currently supports Slack, email, and webhook channels only. ServiceNow overrides are authored through the API below; a UI editor for ServiceNow is planned for a future release.

```bash
curl -X PUT https://yorkermonitoring.com/api/notification-channels/nch_abc \
  -H "Authorization: Bearer $YORKER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "incidentTemplate": {
      "channelType": "servicenow",
      "overrides": {
        "opened": {
          "shortDescription": "[{{upperCase incident.severity}}] {{incident.title}}",
          "description": "Yorker incident {{incident.incidentId}}\n\nHypothesis: {{payload.hypothesis.summary}}\n\nTriage: {{incident.triageUrl}}",
          "workNotes": "Scope: {{payload.hypothesis.scope}}\nRuled out:\n{{#each payload.hypothesis.ruledOut}}  - {{this}}\n{{/each}}"
        }
      }
    }
  }'
```

Template render failures fall back to the default string and log a warning. Dispatch never fails on a bad template.

Helpers and render context are the same as the [Slack integration](/docs/integrations/slack).

`short_description` is always stripped of CR/LF (header injection guard) and capped at 160 characters after render.

## Disabling

The ServiceNow channel type is **incident-pipeline-only** — it has no legacy per-alert dispatch path. `incidentSubscribed` is locked to `true` for ServiceNow channels; the API rejects create/update requests that set it to `false`. To stop routing incidents to a ServiceNow channel, delete it or remove it from the alert rule's channel list.


---

## Email

> Email notifications for incident lifecycle milestones (opened, auto-resolved, closed) with multipart HTML + plain-text bodies.


# Email

Yorker sends incident notifications by email via AWS SES. Email is the **low-noise** channel: inboxes shouldn't be a running timeline, so only the lifecycle milestones that matter for async followers are routed by default.

For the underlying model (lifecycle states, event types, scoped hypothesis), see [Incidents](/docs/concepts/incidents).

## Set up

1. Go to **Settings > Notification Channels**, create an **Email** channel, and add one or more recipient addresses.
2. Or via the API:

```bash
curl -X POST https://yorkermonitoring.com/api/notification-channels \
  -H "Authorization: Bearer $YORKER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "on-call-email",
    "channel": {
      "type": "email",
      "addresses": ["oncall@example.com", "sre-team@example.com"]
    }
  }'
```

| Field       | Required | Description                                       |
| ----------- | -------- | ------------------------------------------------- |
| `addresses` | yes      | At least one valid email address                  |

## What gets sent

Email dispatches on **three** events by default:

- `opened`: full context mail with observations, hypothesis, ruled-out list, and an **Acknowledge in Yorker** button
- `auto_resolved`: resolution mail
- `closed`: close-reason mail

`acknowledged`, `severity_changed`, `note_added`, `alert_attached`, and `reopened` are **not** routed to email by default. They'd turn the inbox into a timeline; use Slack or webhooks for timeline-style updates.

Every mail is multipart: an HTML body for rich clients and a plain-text alternative for others. The HTML body is styled dark (`#000` background, `#111827` card) and includes:

- Event type badge (color-coded: red for opened, teal for resolved/closed)
- Incident ID, severity, affected checks, locations, shared failing domains
- Hypothesis summary
- Ruled-out list
- **Acknowledge in Yorker** call-to-action

Subject line format:

```
Yorker Incident — CRITICAL — Checkout API outage
Yorker Incident Auto-Resolved — Checkout API outage
Yorker Incident Closed — Checkout API outage
```

CR/LF characters are stripped from every subject (header-injection guard) and the line is capped at 200 characters after render.

## Template overrides

All three parts (subject, HTML body, plain-text body) can be overridden per event type.

### Edit in the web UI

Open **Settings > Notification Channels** and click **Templates** next to the email channel. The editor has three tabs (`subject` / `html` / `text`), so you can author each part independently and see a live rendered preview:

- The HTML tab previews in a sandboxed iframe with no same-origin access, so user-supplied markup can't escape into the settings UI
- The subject tab shows a live rendered preview. The 200-character cap is enforced after render at dispatch time; the editor's character counter reflects template source length, not rendered length, so keep rendered subjects concise
- The **fixture selector** picks which canonical incident context to render against (single HTTP failure, multi-location burst, browser check, MCP)
- The **library** sidebar includes a branded incident-digest HTML example you can drop in as a starting point
- **Send test** delivers to the channel's addresses via SES using the active fixture (60-second cooldown per channel)

### Edit via the API

```bash
curl -X PUT https://yorkermonitoring.com/api/notification-channels/nch_abc \
  -H "Authorization: Bearer $YORKER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "incidentTemplate": {
      "channelType": "email",
      "overrides": {
        "opened": {
          "subject": "[{{upperCase incident.severity}}] {{incident.title}}",
          "html": "<h1>{{incident.title}}</h1><p>{{payload.hypothesis.summary}}</p><p><a href=\"{{incident.triageUrl}}\">View in Yorker</a></p>",
          "text": "{{incident.title}}\n\n{{payload.hypothesis.summary}}\n\n{{incident.triageUrl}}"
        }
      }
    }
  }'
```

A render failure on any of the three parts falls back to the default version of **that** part, keeping the other overridden parts. Dispatch does not fail on a bad template.

The HTML body part renders with Handlebars' default `{{...}}` **HTML escaping enabled**, so user-supplied strings in `incident.title`, `payload.hypothesis.summary`, etc. are safely escaped. The subject and text parts render with escaping disabled (plain text).

Helper output is escaped too: `{{jsonBody x}}` is HTML-escaped by default. Use `{{{jsonBody x}}}` (triple-stash) only when you consciously need raw JSON interpolation into the HTML body; the double-stash form is always safe.

Helpers and render context are the same as the [Slack integration](/docs/integrations/slack).

## SES requirements

Yorker uses your platform's shared SES sender. Self-hosters must provision SES credentials via the `AWS_SES_REGION` / `AWS_REGION`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, and `SES_FROM_ADDRESS` environment variables. Dispatch fails with a logged error if SES is not configured.

## Disabling

Set `incidentSubscribed: false` on the channel to fall back to the legacy per-alert email dispatch (one mail per alert, no aggregation).


---

## Webhook

> Send every incident lifecycle event to your own HTTP endpoint as JSON, schema-versioned for forward compatibility.


# Webhook

The webhook integration posts a JSON body for every incident lifecycle event to your own HTTP endpoint. Use this for custom integrations, Opsgenie, Zapier, workflow engines, or anywhere Yorker doesn't ship a purpose-built adapter.

For the underlying model (lifecycle states, event types, scoped hypothesis), see [Incidents](/docs/concepts/incidents).

## Set up

```bash
curl -X POST https://yorkermonitoring.com/api/notification-channels \
  -H "Authorization: Bearer $YORKER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "incident-sink",
    "channel": {
      "type": "webhook",
      "url": "https://hooks.example.com/yorker-incidents",
      "method": "POST",
      "headers": {
        "Authorization": "Bearer ${INCOMING_TOKEN}"
      }
    }
  }'
```

| Field      | Required | Default | Description                                  |
| ---------- | -------- | ------- | -------------------------------------------- |
| `url`      | yes      | n/a     | Destination endpoint                         |
| `method`   | no       | `POST`  | `POST` or `PUT`                              |
| `headers`  | no       | n/a     | Extra headers (e.g., auth). A `Content-Type` header (any casing) is rejected at create/update time. |

Yorker always sends `Content-Type: application/json`. A user-supplied `Content-Type` header would break the documented body-parser contract and is refused by the channel schema.

## What gets sent

The webhook channel receives **every** incident event by default:

- `opened`
- `alert_attached`
- `severity_changed`
- `acknowledged`
- `auto_resolved`
- `closed`
- `reopened`
- `note_added`

### Default payload

```json
{
  "schema_version": 1,
  "event": {
    "eventId": "ievt_001",
    "incidentId": "inc_abc",
    "teamId": "team_123",
    "eventType": "opened",
    "actor": { "type": "system", "id": null },
    "payload": {
      "eventType": "opened",
      "observations": {
        "sources": ["synthetic_http"],
        "syntheticHttp": {
          "affectedChecks": [{ "checkId": "chk_api", "checkName": "Checkout API" }],
          "symptomWindow": { "startedAt": "2026-04-15T09:58:00.000Z" },
          "errorSignature": {
            "httpStatusCodes": [503, 504],
            "errorCategories": ["upstream_error"],
            "locationsAffected": ["loc_us_east_1", "loc_eu_west_1"],
            "sampleMessages": ["Bad Gateway", "Gateway Timeout"]
          },
          "sharedFailingDomains": ["api.stripe.com"]
        }
      },
      "hypothesis": {
        "summary": "Stripe API is returning 503/504; checkout is blocked.",
        "confidence": 0.75,
        "ruledIn": ["shared_failing_domain=api.stripe.com"],
        "ruledOut": ["DNS resolution: NXDOMAIN not observed", "TLS: handshake completes"],
        "correlationDimensionsMatched": ["shared_failing_domain", "error_pattern"],
        "scope": "external_symptoms_only"
      },
      "title": "Checkout API outage",
      "severity": "critical",
      "fingerprintHash": "…",
      "memberAlertInstanceIds": ["ainst_1", "ainst_2"],
      "recurrenceOf": []
    },
    "occurredAt": "2026-04-15T10:00:00.000Z"
  },
  "incident": {
    "incidentId": "inc_abc",
    "title": "Checkout API outage",
    "severity": "critical",
    "state": "open",
    "openedAt": "2026-04-15T10:00:00.000Z",
    "triageUrl": "https://yorkermonitoring.com/dashboard/incidents/inc_abc"
  }
}
```

### schema_version

Every default payload carries `schema_version: 1`. Gate your consumer on this field and Yorker will not silently break your integration when the default shape evolves: breaking changes bump the version; additive changes don't.

## Observations shape

Each source in `observations.sources[]` (snake_case: `synthetic_http`, `synthetic_browser`, `synthetic_mcp`) has a matching camelCase block (`syntheticHttp`, `syntheticBrowser`, `syntheticMcp`) on the same object. A multi-source incident carries every relevant block. Example consumer:

```ts
const obs = event.payload.observations;
if (obs.sources.includes("synthetic_http")) {
  // obs.syntheticHttp is present
  const statusCodes = obs.syntheticHttp.errorSignature.httpStatusCodes;
}
```

## Template overrides

Render your own JSON body with Handlebars. The rendered string must parse as valid JSON; on failure, the default payload is sent instead.

### Edit in the web UI

Open **Settings > Notification Channels** and click **Templates** next to the webhook channel. The editor has JSON syntax highlighting, a live preview that renders your body against one of six canonical fixtures, and a **library** sidebar with two starter bodies (**Default: flat envelope** and **Nested: incident + event objects**) plus curated examples (PagerDuty Events API v2-shaped, OTel log record shape) you can apply with a click.

**Send test** posts the current saved template to your webhook URL with the selected fixture's context (60-second cooldown per channel). The audit row for the test run is written to `incident_notification_dispatches` with `detail_json.isTest = true` so a real incident replay can be distinguished from a smoke-test.

### Edit via the API

```bash
curl -X PUT https://yorkermonitoring.com/api/notification-channels/nch_abc \
  -H "Authorization: Bearer $YORKER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "incidentTemplate": {
      "channelType": "webhook",
      "overrides": {
        "opened": {
          "body": "{\"type\":\"incident.opened\",\"id\":\"{{incident.incidentId}}\",\"severity\":\"{{incident.severity}}\",\"hypothesis\":\"{{payload.hypothesis.summary}}\",\"triage\":\"{{incident.triageUrl}}\"}"
        }
      }
    }
  }'
```

For payloads that want to splat in arbitrary nested structure without mustache-ing every key, use `{{jsonBody payload}}`.

The render context has the following top-level keys (same as the [Slack integration](/docs/integrations/slack)). The event envelope fields mirror `serializeIncidentEventForExport`: `eventId`, `eventType`, `incidentId`, `teamId`, `actor`, `occurredAt`, `payload`. In addition, a materialized `incident` snapshot (`title`, `severity`, `state`, `openedAt`, `triageUrl`) is exposed for direct use in templates. There is no top-level `event` key; use the individual fields or helper as shown below.

```json
{
  "body": "{\"type\":\"{{eventType}}\",\"id\":\"{{eventId}}\",\"occurredAt\":\"{{occurredAt}}\",\"actor\":{{{jsonBody actor}}},\"payload\":{{{jsonBody payload}}},\"incident\":{{{jsonBody incident}}} }"
}
```

Notes:

- JSON-producing templates (webhook, Slack, PagerDuty, ServiceNow) compile with Handlebars HTML escaping disabled, so for these channels `{{foo}}` and `{{{foo}}}` produce identical output. Triple-stash is shown here by convention: it makes the intent (raw interpolation into JSON) obvious to readers. Email HTML templates compile with escaping on, where the two forms are NOT equivalent: `{{jsonBody payload}}` gets HTML-escaped by default (safe) and `{{{jsonBody payload}}}` is an explicit opt-out of escaping that the template author must choose deliberately.
- Handlebars' tokenizer fails on a mustache close that runs directly into a JSON `}`. Both `{{{foo}}}}` (triple-close + literal) and `{{foo}}}` (double-close + literal) raise a parse error. Add a space before the JSON close brace (`{{{foo}}} }`) to disambiguate. The rendered JSON is otherwise unchanged; if your consumer verifies a canonical-JSON HMAC over the body, re-serialize (e.g., `JSON.stringify(JSON.parse(body))`) before hashing so whitespace differences don't break the signature.

A render error or invalid-JSON result falls back to the default payload and logs a warning. Dispatch never fails on a bad template.

Helpers and render context are the same as the [Slack integration](/docs/integrations/slack).

## Delivery and retry

- **Timeout:** Yorker expects a response within the platform HTTP timeout. Slow endpoints risk being recorded as `failed`.
- **Retry:** Yorker does not retry failed webhook deliveries on the same event. Use the [audit trail](/docs/concepts/incidents#audit-trail) (`incident_notification_dispatches`) to replay deliveries from your own backfill tooling.
- **Dedupe:** Within a 30-second window, a duplicate event to the same channel is recorded as `skipped_dedupe` and not re-sent. This protects against runner retry bursts.

## Disabling

Set `incidentSubscribed: false` to fall back to the legacy per-alert webhook dispatch.


---


# Reference

## Configuration (YAML)

> Complete reference for yorker.config.yaml — every field, type, default, and constraint.


# Configuration (YAML)

Yorker uses a declarative YAML file to define monitors, alerts, SLOs, and notification channels as code. The CLI validates every field through the same Zod schemas that the API uses, so defaults and constraints are applied identically.

To get started, create a `yorker.config.yaml` in your project root.

---

## Root Fields

| Field | Type | Required | Description |
|---|---|---|---|
| `project` | `string` | Yes | Project name. Displayed in deploy plan output. |
| `alertChannels` | `object` | No | Named notification channel definitions (key-value map). |
| `defaults` | `object` | No | Default values inherited by all monitors. |
| `groups` | `array` | No | Groups of monitors that share frequency, locations, and alerts. |
| `monitors` | `array` | No | Top-level monitor definitions (outside any group). |
| `slos` | `array` | No | Service Level Objective definitions. |
| `maintenanceWindows` | `array` | No | Scheduled silences / pauses. See [`maintenanceWindows`](#maintenancewindows). |

```yaml
project: "my-app"

alertChannels:
  # ...

defaults:
  # ...

groups:
  # ...

monitors:
  # ...

slos:
  # ...
```

---

## `alertChannels`

To configure notification channels, define them as named entries under `alertChannels`. Alerts and SLOs reference channels by name using `@channel-name` syntax.

Each channel must have a `type` field. The remaining fields depend on the type.

### Slack

| Field | Type | Required | Description |
|---|---|---|---|
| `type` | `"slack"` | Yes | Channel type. |
| `webhookUrl` | `string` (URL) | Yes | Slack incoming webhook URL. |

```yaml
alertChannels:
  ops-slack:
    type: slack
    webhookUrl: "https://hooks.slack.com/services/T00/B00/xxxx"
```

### Email

| Field | Type | Required | Description |
|---|---|---|---|
| `type` | `"email"` | Yes | Channel type. |
| `addresses` | `string[]` | Yes | At least one valid email address. |

```yaml
alertChannels:
  team-email:
    type: email
    addresses:
      - oncall@example.com
      - devops@example.com
```

### Webhook

| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `type` | `"webhook"` | Yes | -- | Channel type. |
| `url` | `string` (URL) | Yes | -- | Webhook endpoint URL. |
| `method` | `"POST"` \| `"PUT"` | No | `"POST"` | HTTP method. |
| `headers` | `object` | No | -- | Custom headers (key-value string pairs). |

```yaml
alertChannels:
  pagerduty:
    type: webhook
    url: "https://events.pagerduty.com/v2/enqueue"
    method: POST
    headers:
      Content-Type: "application/json"
```

---

## `defaults`

To set values that apply to all monitors unless overridden, use the `defaults` block. Monitors and groups can override any default.

| Field | Type | Default | Description |
|---|---|---|---|
| `frequency` | `string` | `"5m"` (300s) | Check interval. Format: `Ns`, `Nm`, or `Nh`. Range: 10s-86400s. |
| `locations` | `string[]` | `["loc_us_east", "loc_eu_central"]` | Location IDs to run from. Must be non-empty. |
| `http` | `object` | -- | Default HTTP check configuration. |
| `browser` | `object` | -- | Default browser check configuration. |
| `alerts` | `array` | -- | Default alert rules applied to all monitors. |

### Frequency Format

The frequency value uses a duration string with a numeric value and a unit suffix.

| Unit | Suffix | Example | Seconds |
|---|---|---|---|
| Seconds | `s` | `30s` | 30 |
| Minutes | `m` | `5m` | 300 |
| Hours | `h` | `1h` | 3600 |

The resolved value in seconds must be between **10** and **86400** (24 hours).

```yaml
defaults:
  frequency: "5m"
  locations:
    - loc_us_east
    - loc_eu_central
    - loc_ap_northeast
```

### `defaults.http`

To configure default values for all HTTP monitors, use the `defaults.http` block.

| Field | Type | Default | Description |
|---|---|---|---|
| `timeoutMs` | `number` | `30000` | Request timeout in milliseconds. |
| `followRedirects` | `boolean` | `true` | Whether to follow HTTP redirects. |
| `maxRedirects` | `number` | `5` | Maximum number of redirects to follow. |
| `assertions` | `array` | `[]` | Default assertion rules. See [Assertions](/docs/reference/assertions). |

```yaml
defaults:
  http:
    timeoutMs: 15000
    followRedirects: true
    maxRedirects: 3
    assertions:
      - type: status_code
        value: 200
      - type: response_time
        max: 5000
```

### `defaults.browser`

To configure default values for all browser monitors, use the `defaults.browser` block.

| Field | Type | Default | Constraints | Description |
|---|---|---|---|---|
| `timeoutMs` | `number` | `30000` | 5000-120000 | Script execution timeout in milliseconds. |
| `viewport` | `object` | `{ width: 1280, height: 720 }` | -- | Browser viewport dimensions. |
| `viewport.width` | `number` | `1280` | -- | Viewport width in pixels. |
| `viewport.height` | `number` | `720` | -- | Viewport height in pixels. |
| `screenshotMode` | `string` | `"every_step"` | `every_step` \| `failure_only` \| `disabled` | When to capture screenshots. |
| `videoEnabled` | `boolean` | `false` | -- | Whether to record video. |
| `device` | `string` | -- | -- | Playwright device name for emulation (e.g., `"iPhone 14"`). |

```yaml
defaults:
  browser:
    timeoutMs: 60000
    viewport:
      width: 1920
      height: 1080
    screenshotMode: every_step
    videoEnabled: false
```

### `defaults.alerts`

To set alert rules that apply to all monitors by default, define them under `defaults.alerts`. Each alert requires at least one condition and at least one channel reference.

```yaml
defaults:
  alerts:
    - name: "default-failure-alert"
      conditions:
        - type: consecutive_failures
          count: 3
      channels:
        - "@ops-slack"
```

See [Groups and Monitors](#groups) for how alert inheritance works.

---

## `groups`

To organize monitors that share configuration, use groups. Groups can override `defaults` for frequency, locations, and alerts.

| Field | Type | Required | Description |
|---|---|---|---|
| `name` | `string` | Yes | Group name (for display in deploy plan). |
| `frequency` | `string` | No | Overrides `defaults.frequency` for all monitors in this group. |
| `locations` | `string[]` | No | Overrides `defaults.locations` for all monitors in this group. |
| `alerts` | `array` | No | Overrides `defaults.alerts` for all monitors in this group. |
| `monitors` | `array` | Yes | Monitor definitions within this group. |

```yaml
groups:
  - name: "US API endpoints"
    frequency: "1m"
    locations:
      - loc_us_east
      - loc_us_west
    alerts:
      - name: "api-down"
        conditions:
          - type: consecutive_failures
            count: 2
        channels:
          - "@ops-slack"
          - "@team-email"
    monitors:
      - name: "Users API"
        type: http
        url: "https://api.example.com/v1/users"
      - name: "Orders API"
        type: http
        url: "https://api.example.com/v1/orders"
```

---

## `monitors`

### HTTP Monitors

To define an HTTP monitor, set `type: http` and provide a `url`.

| Field | Type | Required | Default | Constraints | Description |
|---|---|---|---|---|---|
| `name` | `string` | Yes | — | 1-255 characters | Unique monitor name. |
| `type` | `"http"` | Yes | — | — | Monitor type. |
| `url` | `string` | Yes | — | Valid URL | Target URL to check. |
| `method` | `string` | No | `"GET"` | `GET` \| `POST` \| `PUT` \| `DELETE` \| `PATCH` \| `HEAD` | HTTP method. |
| `headers` | `object` | No | — | — | Custom request headers (key-value string pairs). |
| `body` | `string` | No | — | — | Request body (ignored for GET and HEAD). |
| `auth` | `object` | No | — | — | Authentication configuration. See [Auth](#auth). |
| `followRedirects` | `boolean` | No | `true` | — | Whether to follow redirects. Overrides `defaults.http`. |
| `maxRedirects` | `number` | No | `5` | — | Maximum redirects. Overrides `defaults.http`. |
| `timeoutMs` | `number` | No | `30000` | — | Request timeout in ms. Overrides `defaults.http`. |
| `assertions` | `array` | No | `[]` | — | Assertion rules. **Replaces** defaults (not merged). See [Assertions](/docs/reference/assertions). |
| `frequency` | `string` | No | From defaults/group | 10s-86400s | Check interval. |
| `locations` | `string[]` | No | From defaults/group | Non-empty | Location IDs. |
| `alerts` | `array` | No | From defaults/group | — | Alert rules. |
| `labels` | `string[]` | No | — | See [Labels](#labels) | Labels attached to this check. Emitted as OTel resource attributes. |
| `enabled` | `boolean` | No | `true` | — | Whether this monitor is active. |

```yaml
monitors:
  - name: "Homepage"
    type: http
    url: "https://www.example.com"
    method: GET
    timeoutMs: 10000
    assertions:
      - type: status_code
        value: 200
      - type: response_time
        max: 3000
      - type: body_contains
        value: "Welcome"

  - name: "Create Order API"
    type: http
    url: "https://api.example.com/v1/orders"
    method: POST
    headers:
      Content-Type: "application/json"
    body: '{"item": "test", "quantity": 1}'
    auth:
      type: bearer
      token: "{{secrets.API_TOKEN}}"
    assertions:
      - type: status_code
        value: 201
```

### Browser Monitors

Browser monitors defined in `yorker.config.yaml` are always scripted — point `script` at a Playwright TypeScript file and Yorker runs it on each check. URL-mode browser monitors (which navigate a single URL without a script) are currently created through the Web UI or the REST API only; they cannot be deployed via YAML yet.

| Field | Type | Required | Default | Constraints | Description |
|---|---|---|---|---|---|
| `name` | `string` | Yes | — | 1-255 characters | Unique monitor name. |
| `type` | `"browser"` | Yes | — | — | Monitor type. |
| `script` | `string` | Yes | — | — | Path to Playwright script file (relative to config file). |
| `steps` | `array` | No | — | — | Optional named steps. Each entry has `name` (must match a `// @step: Name` marker in the script), optional `timeoutMs`, and optional `assertions`. |
| `viewport` | `object` | No | `{ width: 1280, height: 720 }` | — | Browser viewport dimensions. Overrides `defaults.browser`. |
| `device` | `string` | No | — | — | Playwright device name for emulation. Overrides `defaults.browser`. |
| `screenshotMode` | `string` | No | `"every_step"` | `every_step` \| `failure_only` \| `disabled` | Screenshot capture mode. Overrides `defaults.browser`. |
| `videoEnabled` | `boolean` | No | `false` | — | Whether to record video. Overrides `defaults.browser`. |
| `timeoutMs` | `number` | No | `30000` | 5000-120000 | Script timeout in ms. Overrides `defaults.browser`. |
| `frequency` | `string` | No | From defaults/group | 10s-86400s | Check interval. |
| `locations` | `string[]` | No | From defaults/group | Non-empty | Location IDs. |
| `alerts` | `array` | No | From defaults/group | — | Alert rules. |
| `labels` | `string[]` | No | — | See [Labels](#labels) | Labels attached to this check. |
| `enabled` | `boolean` | No | `true` | — | Whether this monitor is active. |

```yaml
monitors:
  - name: "Login Flow"
    type: browser
    script: "./monitors/login.ts"
    viewport:
      width: 1920
      height: 1080
    screenshotMode: every_step
    timeoutMs: 60000
    frequency: "10m"
    locations:
      - loc_us_east
      - loc_eu_west
```

### MCP Monitors

To define an MCP monitor (for checking Model Context Protocol servers over Streamable HTTP), set `type: mcp` and provide an `endpoint`.

| Field | Type | Required | Default | Constraints | Description |
|---|---|---|---|---|---|
| `name` | `string` | Yes | — | 1-255 characters | Unique monitor name. |
| `type` | `"mcp"` | Yes | — | — | Monitor type. |
| `endpoint` | `string` | Yes | — | Valid URL | Streamable HTTP endpoint of the MCP server. |
| `timeoutMs` | `number` | No | `30000` | 5000-120000 | Request timeout in ms. |
| `auth` | `object` | No | — | — | Same shape as HTTP [`auth`](#auth). |
| `expectedTools` | `string[]` | No | — | — | Tool names that must be present. Missing tools fail the check. |
| `testCalls` | `array` | No | — | — | Tool invocations to exercise. See below. |
| `detectSchemaDrift` | `boolean` | No | `true` | — | Emit events when the tool list or tool signatures change. |
| `frequency` | `string` | No | From defaults/group | 10s-86400s | Check interval. |
| `locations` | `string[]` | No | From defaults/group | Non-empty | Location IDs. |
| `alerts` | `array` | No | From defaults/group | — | Alert rules. |
| `labels` | `string[]` | No | — | See [Labels](#labels) | Labels attached to this check. |
| `enabled` | `boolean` | No | `true` | — | Whether this monitor is active. |

`testCalls` entry fields:

| Field | Type | Required | Description |
|---|---|---|---|
| `toolName` | `string` | Yes | Name of the tool to invoke. |
| `arguments` | `object` | No | Plain JSON key-value map of arguments passed to the tool. |
| `expectedOutputContains` | `string` | No | Substring that must appear in the tool's result. |

```yaml
monitors:
  - name: "Docs MCP"
    type: mcp
    endpoint: "https://mcp.example.com/sse"
    frequency: "5m"
    auth:
      type: bearer
      token: "{{secrets.MCP_TOKEN}}"
    expectedTools:
      - search_docs
      - fetch_page
    testCalls:
      - toolName: search_docs
        arguments:
          query: "pricing"
        expectedOutputContains: "Plans"
    detectSchemaDrift: true
    locations:
      - loc_us_east
```

---

## `auth`

To authenticate HTTP requests, add an `auth` block to an HTTP monitor. Three authentication types are supported.

### Basic Auth

| Field | Type | Required | Description |
|---|---|---|---|
| `type` | `"basic"` | Yes | Auth type. |
| `username` | `string` | Yes | Username. |
| `password` | `string` | Yes | Password. |

```yaml
auth:
  type: basic
  username: "{{secrets.BASIC_USER}}"
  password: "{{secrets.BASIC_PASS}}"
```

### Bearer Token

| Field | Type | Required | Description |
|---|---|---|---|
| `type` | `"bearer"` | Yes | Auth type. |
| `token` | `string` | Yes | Bearer token value. |

```yaml
auth:
  type: bearer
  token: "{{secrets.API_TOKEN}}"
```

### API Key

| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `type` | `"api-key"` | Yes | -- | Auth type. |
| `header` | `string` | No | `"X-API-Key"` | Header name to send the key in. |
| `value` | `string` | Yes | -- | API key value. |

```yaml
auth:
  type: api-key
  header: "X-Custom-Key"
  value: "{{secrets.CUSTOM_API_KEY}}"
```

---

## `slos`

To define Service Level Objectives, add entries to the `slos` array. Each SLO is linked to a monitor by name.

| Field | Type | Required | Default | Constraints | Description |
|---|---|---|---|---|---|
| `name` | `string` | Yes | -- | 1-255 characters | Unique SLO name. |
| `monitor` | `string` | Yes | -- | Must match a monitor `name` | Reference to the monitor this SLO tracks. |
| `target` | `string` \| `number` | Yes | -- | Resolves to 1-9999 basis points | Availability target. `"99.9%"` or `99.9`. |
| `window` | `string` \| `number` | Yes | -- | `"7d"` \| `"14d"` \| `"30d"` or `7` \| `14` \| `30` | Rolling window. |
| `burnRateAlerts` | `boolean` | No | `true` | -- | Enable burn rate alerting. |
| `channels` | `array` | No | `[]` | -- | Channel references for burn rate alerts (e.g., `["@ops-slack"]`). |
| `enabled` | `boolean` | No | `true` | -- | Whether this SLO is active. |

### Target Format

The target can be specified as a percentage string or a number:

- `"99.9%"` -- parsed as 9990 basis points
- `99.9` -- treated as a percentage, parsed as 9990 basis points
- Valid range: 1-9999 basis points (0.01% to 99.99%)

### Window Format

The window can be a duration string or a number:

- `"7d"`, `"14d"`, `"30d"` -- string format
- `7`, `14`, `30` -- numeric format (days)
- Only these three values are allowed.

```yaml
slos:
  - name: "Homepage Availability"
    monitor: "Homepage"
    target: "99.9%"
    window: "30d"
    burnRateAlerts: true
    channels:
      - "@ops-slack"

  - name: "API Uptime"
    monitor: "Users API"
    target: 99.95
    window: 7d
    channels:
      - "@ops-slack"
      - "@team-email"
```

---

## `maintenanceWindows`

To silence alerts during scheduled work, add entries to `maintenanceWindows`. A window can pause checks entirely or let them continue running while suppressing notifications.

> **Single-file format only.** `maintenanceWindows` is supported when you deploy from a single `yorker.config.yaml`. The directory format (`yorker init --format directory`) does not currently support a per-resource maintenance-window YAML file. If you need maintenance windows in code, stay on the single-file format. Also note that `yorker pull` does not export maintenance windows and overwrites the YAML file; see the [CLI reference for `yorker pull`](/docs/reference/cli#yorker-pull) for the round-trip caveat.

| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `name` | `string` | Yes | — | Unique window name. |
| `mode` | `string` | No | `pause` | `pause` (stop running checks) or `continue` (run but silence alerts). |
| `checks` | `"all"` \| `string[]` | Yes | — | `"all"` or a list of monitor names covered by the window. |
| `startsAt` | `string` | Yes | — | ISO-8601 start timestamp. |
| `endsAt` | `string` | Yes | — | ISO-8601 end timestamp. Must be after `startsAt`. |
| `recurring` | `boolean` | No | `false` | Enable recurrence. |
| `recurrenceRule` | `string` | No | — | RRULE string (e.g., `FREQ=WEEKLY;BYDAY=SU`). Required when `recurring: true`. |

```yaml
maintenanceWindows:
  - name: "Weekly DB maintenance"
    mode: pause
    checks: all
    startsAt: "2026-04-12T02:00:00Z"
    endsAt:   "2026-04-12T03:00:00Z"
    recurring: true
    recurrenceRule: "FREQ=WEEKLY;BYDAY=SU"

  - name: "Black Friday cut-over"
    mode: continue
    checks:
      - "Homepage"
      - "Checkout Flow"
    startsAt: "2026-11-27T05:00:00Z"
    endsAt:   "2026-11-27T06:00:00Z"
```

---

## `alerts` (alert rule definitions)

Alert rules live on monitors, in `groups`, or in `defaults`. Each rule has a list of `conditions` (ANDed together) and `channels` (channel references).

| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `name` | `string` | No | — | Optional rule name. |
| `enabled` | `boolean` | No | `true` | Whether the rule is active. |
| `conditions` | `array` | Yes | — | At least one condition. |
| `channels` | `string[]` | Yes | — | Channel references using `@channel-name` syntax. |

### Condition types

| Type | Fields | Description |
|---|---|---|
| `consecutive_failures` | `count` (default `2`, min `1`) | Trigger after N failures in a row. |
| `response_time_threshold` | `maxMs` (required) | Trigger when response time exceeds threshold. |
| `multi_location_failure` | `minLocations` (default `2`, min `2`), `windowSeconds` (default `300`) | Trigger when failures correlate across multiple locations. |
| `ssl_expiry` | `daysBeforeExpiry` (default `14`, min `1`), `severity` (optional) | Trigger when SSL cert approaches expiration. |
| `ssl_certificate_changed` | `severity` (optional) | Trigger when the leaf certificate fingerprint changes between runs. |
| `ssl_self_signed` | `severity` (optional) | Trigger when a self-signed or untrusted certificate is detected. |
| `ssl_protocol_deprecated` | `minProtocol` (default `TLSv1.2`; allowed `TLSv1.2`, `TLSv1.3`), `severity` (optional) | Trigger when the handshake negotiates a protocol older than `minProtocol`. |
| `burn_rate` | `sloId`, `burnRateThreshold`, `longWindowMinutes` (min `60`), `shortWindowMinutes` (min `5`, must be less than long) | SLO burn-rate alert. Most users let SLOs generate burn-rate alerts automatically via `burnRateAlerts: true`. |
| `baseline_anomaly` | `metric` (required; `response_time`, `dns_lookup`, `tls_handshake`, `ttfb`, `content_transfer`, `lcp`, `fcp`, or `cls`), `sigmaThreshold` (default `3`, 2–10), `consecutiveCount` (default `3`, integer 2–20), `direction` (default `above`; allowed `above`, `below`, `both`), `severity` (default `warning`) | Trigger when the last `consecutiveCount` runs are all successes and each deviates by more than `sigmaThreshold`·σ from its own (hour × day-of-week × location) baseline in the configured direction. Any non-success run inside the window breaks the chain. |

All SSL conditions (including `ssl_expiry`), `mcp_schema_drift`, and `baseline_anomaly` support an optional `severity` field: `critical`, `warning`, or `info`. `mcp_schema_drift` and `baseline_anomaly` default to `warning` from the schema; SSL conditions fall back to `critical` via the evaluator when unset.

See [Set Up Alerts](/docs/guides/set-up-alerts) for worked examples.

---

## Labels

Labels attach metadata to checks. They serve two purposes:

1. **Filtering and grouping** in the dashboard.
2. **OTel resource attributes** — Yorker emits every label as a `yorker.label.*` resource attribute on metrics and traces, so you can slice telemetry by label in your observability backend.

Labels follow this format: `[a-zA-Z0-9][a-zA-Z0-9_.:-]*`, max 128 characters.

| Label form | OTel attribute |
|---|---|
| `env:production` | `yorker.label.env="production"` |
| `service:payments` | `yorker.label.service="payments"` |
| `critical` *(no colon)* | `yorker.label.critical="true"` |

```yaml
monitors:
  - name: "Payments API"
    type: http
    url: "https://api.example.com/payments"
    labels:
      - env:production
      - service:payments
      - critical
```

| Behavior | Meaning |
|---|---|
| `labels` omitted | Labels are **unmanaged** by config — the CLI preserves whatever labels exist on the remote. |
| `labels: []` | Explicitly clears all labels on the check. |
| `labels: [...]` | Sets the full list of labels on the check. |

---

## Secret Interpolation

To inject secrets and environment variables into your config, use placeholder syntax. Placeholders are resolved at deploy time from environment variables.

### `{{secrets.NAME}}`

Reads `YORKER_SECRET_NAME` first, then falls back to `NAME` from the environment.

```yaml
auth:
  type: bearer
  token: "{{secrets.API_TOKEN}}"
# Resolves: $YORKER_SECRET_API_TOKEN ?? $API_TOKEN
```

### `{{env.NAME}}`

Reads the environment variable `NAME` directly.

```yaml
url: "{{env.BASE_URL}}/health"
# Resolves: $BASE_URL
```

### `${NAME}` (Legacy)

Reads the environment variable `NAME` directly. This is the legacy syntax, supported only in YAML config values. It is **not** applied inside browser script files because `${...}` conflicts with JavaScript template literals.

```yaml
url: "${BASE_URL}/health"
# Resolves: $BASE_URL
```

### Script Interpolation

Browser script files (`.ts` files referenced by `script:`) support `{{secrets.NAME}}` and `{{env.NAME}}` interpolation. The legacy `${NAME}` syntax is intentionally excluded from scripts to avoid conflicts with JavaScript template literals.

### Unresolved Placeholders

If a placeholder cannot be resolved, a warning is printed and the raw placeholder text is preserved. After interpolation, the CLI checks for any remaining unresolved placeholders and fails with an error listing each one and its location in the config.

---

## Inheritance Rules

Configuration values cascade through three levels: **defaults**, **group**, and **monitor**. The most specific value wins.

```
monitor > group > defaults
```

| Setting | Behavior |
|---|---|
| `frequency` | Monitor overrides group, group overrides defaults. Falls back to 300s (5m). |
| `locations` | Monitor overrides group, group overrides defaults. Falls back to `["loc_us_east", "loc_eu_central"]`. |
| `alerts` | Monitor overrides group, group overrides defaults. **Explicit presence (even an empty array) is an intentional override.** An empty `alerts: []` on a monitor disables all alerts for that monitor. |
| `assertions` | Monitor-level assertions **replace** defaults entirely. They are **not merged** with `defaults.http.assertions`. This matches Terraform/Checkly semantics. |
| `labels` | Monitor-only (no cascade). Omitting leaves labels unmanaged. `labels: []` clears all labels. |
| HTTP config (`timeoutMs`, `followRedirects`, `maxRedirects`) | Monitor overrides defaults. |
| Browser config (`timeoutMs`, `viewport`, `device`, `screenshotMode`, `videoEnabled`) | Monitor overrides defaults. |

---

## Available Locations

To see all available locations and their IDs, call `GET /api/locations`.

| Location ID | Display Name | Fly Region |
|---|---|---|
| `loc_us_east` | US East (Ashburn) | `iad` |
| `loc_us_south` | US South (Dallas) | `dfw` |
| `loc_us_west` | US West (Los Angeles) | `lax` |
| `loc_na_north` | Canada (Toronto) | `yyz` |
| `loc_sa_east` | South America (São Paulo) | `gru` |
| `loc_eu_west` | Europe West (London) | `lhr` |
| `loc_eu_west_2` | Europe West (Paris) | `cdg` |
| `loc_eu_central` | Europe Central (Frankfurt) | `fra` |
| `loc_eu_north` | Europe North (Stockholm) | `arn` |
| `loc_ap_southeast` | Asia Pacific (Singapore) | `sin` |
| `loc_ap_northeast` | Asia Pacific (Tokyo) | `nrt` |
| `loc_ap_south` | Asia Pacific (Mumbai) | `bom` |
| `loc_ap_oceania` | Oceania (Sydney) | `syd` |
| `loc_af_south` | Africa (Johannesburg) | `jnb` |

---

## Full Example

```yaml
project: "acme-production"

alertChannels:
  ops-slack:
    type: slack
    webhookUrl: "{{secrets.SLACK_WEBHOOK_URL}}"
  oncall-email:
    type: email
    addresses:
      - oncall@acme.com
  pagerduty:
    type: webhook
    url: "https://events.pagerduty.com/v2/enqueue"
    method: POST
    headers:
      Content-Type: "application/json"

defaults:
  frequency: "5m"
  locations:
    - loc_us_east
    - loc_eu_central
  http:
    timeoutMs: 15000
    followRedirects: true
    assertions:
      - type: status_code
        value: 200
  browser:
    timeoutMs: 60000
    screenshotMode: every_step
  alerts:
    - name: "default-alert"
      conditions:
        - type: consecutive_failures
          count: 3
      channels:
        - "@ops-slack"

groups:
  - name: "Critical APIs"
    frequency: "1m"
    locations:
      - loc_us_east
      - loc_us_west
      - loc_eu_west
    alerts:
      - name: "critical-api-alert"
        conditions:
          - type: consecutive_failures
            count: 2
          - type: multi_location_failure
            minLocations: 2
            windowSeconds: 300
        channels:
          - "@ops-slack"
          - "@pagerduty"
    monitors:
      - name: "Auth API"
        type: http
        url: "https://api.acme.com/v1/auth/health"
      - name: "Payments API"
        type: http
        url: "https://api.acme.com/v1/payments/health"
        assertions:
          - type: status_code
            value: 200
          - type: response_time
            max: 1000

monitors:
  - name: "Homepage"
    type: http
    url: "https://www.acme.com"

  - name: "Checkout Flow"
    type: browser
    script: "./monitors/checkout.ts"
    frequency: "10m"
    locations:
      - loc_us_east
    alerts:
      - name: "checkout-alert"
        conditions:
          - type: consecutive_failures
            count: 1
        channels:
          - "@ops-slack"
          - "@oncall-email"

slos:
  - name: "Auth API SLO"
    monitor: "Auth API"
    target: "99.95%"
    window: "30d"
    channels:
      - "@ops-slack"

  - name: "Homepage SLO"
    monitor: "Homepage"
    target: "99.9%"
    window: "7d"
    burnRateAlerts: true
    channels:
      - "@ops-slack"
```


---

## CLI

> Complete reference for the Yorker CLI — all commands, flags, and environment variables.


# CLI

The Yorker CLI manages monitors, alerts, SLOs, notification channels, and private locations. Use it for infrastructure-as-code deployments from `yorker.config.yaml`, or manage resources imperatively with CRUD commands.

---

## Installation

```bash
npm install -g @yorker/cli
```

Or locally in a project:

```bash
npm install --save-dev @yorker/cli
```

---

## Authentication

Generate an API key from **Settings > API Keys** in the dashboard, then set it:

```bash
export YORKER_API_KEY=sk_your_key_here
```

---

## Global Flags

These flags are available on every command:

| Flag | Description |
|---|---|
| `--json` | Output in structured JSON format. |
| `--quiet` | Suppress non-essential output. |
| `--no-color` | Disable colored output. |
| `--verbose` | Enable verbose/debug output. |
| `--api-url <url>` | Control plane URL (default: `https://app.yorkermonitoring.com`). Also settable via `YORKER_API_URL`. |
| `--api-key <key>` | API key. Also settable via `YORKER_API_KEY`. |
| `--help`, `-h` | Show help. |
| `--version`, `-V` | Show version. |

---

## Environment Variables

| Variable | Required | Default | Description |
|---|---|---|---|
| `YORKER_API_KEY` | Yes | — | API key for authentication. Keys start with `sk_`. |
| `YORKER_API_URL` | No | `https://app.yorkermonitoring.com` | Control plane base URL. Override for self-hosted or development. |

---

## Exit Codes

| Code | Meaning |
|---|---|
| `0` | Success. |
| `1` | General error (validation failure, API error, missing config). |
| `2` | Authentication failure (invalid or missing API key). |
| `3` | Plan/quota limit exceeded. |
| `4` | Partial failure (some operations succeeded, others failed). |
| `5` | Drift detected — remote resources were modified outside the CLI. |
| `10` | `yorker status` reports one or more unhealthy monitors. |

---

## Commands

### `yorker init`

Scaffold a new monitoring config in the current directory. Interactive prompts guide you through project name, first monitor URL, type, and frequency. In non-interactive contexts (CI, `--json`, or a non-TTY shell) you must pass `--url` (and optionally the other fields) as flags or the command errors out.

```bash
yorker init [--name <name>] [--url <url>] [--type <type>] [--frequency <freq>] [--format file|directory] [--force]
```

| Flag | Description |
|---|---|
| `--name <name>` | Project name (skips prompt). |
| `--url <url>` | First monitor URL (skips prompt). Required in non-interactive contexts. |
| `--type <type>` | Monitor type: `http`, `browser`, or `mcp` (skips prompt). Default: `http`. |
| `--frequency <freq>` | Check frequency, e.g. `30s`, `5m`, `1h` (skips prompt). Default: `5m`. |
| `--format <format>` | Output layout: `file` (a single `yorker.config.yaml`, the default) or `directory` (a `.yorker/` directory with one YAML file per resource). |
| `--force` | Overwrite existing config without asking. |

The generated config uses the two default locations `loc_us_east` and `loc_eu_central` and the frequency you choose. HTTP monitors get a starter `status_code: 200` assertion; browser monitors get a `./monitors/<name>.ts` script stub placeholder; MCP monitors just get the endpoint URL.

---

### `yorker deploy`

Push your local `yorker.config.yaml` to the control plane. Computes a diff against remote state, displays a Terraform-style plan, and applies changes.

```bash
yorker deploy [--dry-run] [--prune] [--force] [--accept-remote] [--wait]
```

| Flag | Description |
|---|---|
| `--dry-run` | Show the deploy plan without applying changes. |
| `--prune` | Delete remote resources not defined in the local config. |
| `--force` | Overwrite remote changes (ignore drift). Local config wins. |
| `--accept-remote` | Skip drifted resources, keeping their remote changes. |
| `--wait` | After a successful create, poll for the first check result from each new monitor (3-minute overall timeout). Exits with code `4` (`PARTIAL_FAILURE`) if any new check's first result is a failure, or if waiting for a first result times out. Useful in CI to catch broken monitors before merging. |

> `--force` and `--accept-remote` are mutually exclusive — pass one or the other, not both.

#### Deploy pipeline

1. **Load** — reads `yorker.config.yaml` from the current directory.
2. **Interpolate** — resolves `{{secrets.NAME}}`, `{{env.NAME}}`, and `${NAME}` placeholders from environment variables.
3. **Validate** — parses every resource through shared Zod schemas. Fails with all errors collected.
4. **Resolve** — applies cascading defaults (defaults → group → monitor), reads browser script files, builds API payloads.
5. **Fetch remote state** — gets all checks, alerts, SLOs, channels, and maintenance windows from the API.
6. **Detect drift** — compares remote `updatedAt` timestamps against the stored deploy state. See [Drift detection](#drift-detection) below.
7. **Compute plan** — field-level diffs between local and remote state.
8. **Display plan** — Terraform-style plan output.
9. **Apply** — if not `--dry-run`, applies changes in ordered phases.
10. **Save state** — writes `.yorker/.deploy-state.json` with config hashes and remote timestamps.

#### Deploy phases

Changes are applied in dependency order:

| Phase | Action | Why |
|---|---|---|
| 0 | Create/update notification channels | Channels must exist before alerts or SLOs reference them. |
| A | Delete SLOs | Must be deleted before their parent checks. |
| B | Delete alerts | Must be deleted before their parent checks. |
| C | Delete checks | Safe after dependent resources removed. |
| D+E | Create and update checks | New checks get server-assigned IDs needed by alerts and SLOs. |
| E2 | Sync check labels | Labels are a separate API, applied after checks exist so new check IDs are known. |
| F | Create/update alerts | Alerts reference check IDs and channel IDs. |
| G | Create/update SLOs | SLOs reference check IDs and channel IDs. |
| H | Create/update maintenance windows | Maintenance windows reference check IDs, so they run after checks exist. |
| I | Delete maintenance windows (only with `--prune`) | Stale maintenance windows are only removed when you opt in via `--prune`. |
| Z | Delete channels | Runs last so alerts/SLOs referring to channels are gone first. Delete only runs when `--prune` is set, or when a channel's `type` changed in YAML (type changes require delete-and-recreate). |

#### Plan output

```
Yorker deploy plan for "my-project"

  Channels:
    + CREATE  "ops-slack"
    = UNCHANGED  "team-email"

  Checks:
    + CREATE  http  "Users API"  (60s, 3 locations)
    ~ UPDATE  http  "Homepage"
        ~ configJson.timeoutMs  30000 -> 15000
    - DELETE  http  "Old Endpoint"
    = UNCHANGED  http  "Orders API"

  Alerts:
    + CREATE  "api-down" (Users API)

  SLOs:
    ~ UPDATE  "Homepage SLO"
        ~ targetBasisPoints  9990 -> 9995

  Summary: 2 to create, 2 to update, 1 to delete, 2 unchanged
```

| Symbol | Meaning |
|---|---|
| `+` CREATE | Resource will be created. |
| `~` UPDATE | Resource will be updated. Field-level diffs shown below. |
| `-` DELETE | Resource will be deleted (requires `--prune` or type change). |
| `=` UNCHANGED | No action needed. |

#### Drift detection

The CLI tracks the state of each resource after every deploy in `.yorker/.deploy-state.json`. On the next deploy, it compares remote `updatedAt` timestamps against the stored state to detect resources modified outside the CLI (e.g., via the web UI).

| Local changed? | Remote changed? | Result |
|---|---|---|
| No | No | Unchanged — skip. |
| Yes | No | Normal update — proceed. |
| No | Yes | **Drift** — remote was edited outside the CLI. |
| Yes | Yes | **Conflict** — both local config and remote were changed. |

If drift or conflicts are detected, the deploy aborts with a report:

```
  Warning: 1 drifted, 1 conflicting resource(s) modified outside config

  Checks:
    ! DRIFT     "api-health"  (remote changed since last deploy)

  Alerts:
    ! CONFLICT  "api-health:latency"  (both local and remote changed)

  Use --force to overwrite remote changes, or --accept-remote to keep them.
```

Resolution options:

| Flag | Behavior |
|---|---|
| *(default)* | Abort with drift/conflict report. |
| `--force` | Local config wins — overwrite remote changes. |
| `--accept-remote` | Skip drifted/conflicting resources — keep remote as-is. |

The deploy state file `.yorker/.deploy-state.json` is per-machine state. Add it to your `.gitignore` to avoid committing it. Every successful `yorker pull` rewrites this file with a fresh snapshot of remote state, so the next deploy treats everything as a clean baseline.

---

### `yorker diff`

Show changes between local config and remote state without applying them.

```bash
yorker diff
```

Displays the same Terraform-style plan as `yorker deploy --dry-run`. Designed for CI pipelines and quick checks — see [CI/CD Integration](/docs/guides/ci-cd) for full workflow examples.

---

### `yorker pull`

Export remote monitors, alerts, SLOs, and notification channels to a local config. The reverse of `yorker deploy`.

```bash
yorker pull [--output <path>] [--format file|directory] [--monitor <name>] [--force]
```

| Flag | Description |
|---|---|
| `-o, --output <path>` | Output path. Default: `yorker.config.yaml` in file format, or `.yorker/` in directory format. |
| `--format <format>` | Output layout: `file` (single `yorker.config.yaml`, default) or `directory` (a `.yorker/` tree with one YAML file per resource). |
| `--monitor <name>` | Pull a single monitor by name or ID. |
| `--force` | Overwrite existing config file and scripts. |

Fetches all monitors, alerts, SLOs, and notification channels and writes a config file that `yorker deploy` can consume. Browser monitor scripts are extracted to `./monitors/<slug>.ts`, where `<slug>` is the monitor name lowercased with non-alphanumeric characters replaced by hyphens (e.g., `Login Flow` becomes `login-flow.ts`).

> **`yorker pull` does not export maintenance windows and overwrites your entire config file.** If you manage maintenance windows through YAML, running `pull` will erase them from your local file (remote maintenance windows are not touched, but they will not reappear in the re-pulled YAML). If you need to round-trip safely, either avoid `pull` on projects that use `maintenanceWindows`, or re-add them to the YAML after each pull. A subsequent `yorker deploy` without `--prune` leaves remote maintenance windows alone; `yorker deploy --prune` will delete any remote maintenance windows that are not in the (now empty) local YAML.

---

### `yorker import`

> **Coming soon.** The `import` command is a scaffold today — running it reports the selected source, the source and output paths, and a link to the changelog where the release will land, then exits with code `1`. The importers themselves ship in a follow-up release.

Import existing monitors from another synthetic monitoring tool into a Yorker config. The resulting YAML is ready to deploy with `yorker deploy`, with inline `TODO` comments on anything that doesn't translate cleanly.

```bash
yorker import --from <source> [--output <path>] [path]
```

| Flag / Argument | Description |
|---|---|
| `--from <source>` | **Required.** Source tool to import from. See supported sources below. |
| `--output <path>` | Directory to write the generated Yorker YAML into. Default: `.` (current directory). |
| `[path]` | Path to the source repo or config file (e.g. `./src/__checks__` for Checkly, `./tests` for a Playwright project). |

**Supported sources** (planned):

| Source | `--from` value |
|---|---|
| Checkly | `checkly` |
| Datadog Synthetics | `datadog-synthetics` |
| Better Stack | `better-stack` |
| Pingdom | `pingdom` |
| Grafana Synthetic Monitoring | `grafana-synthetic-monitoring` |
| Playwright | `playwright` |

Example (once available):

```bash
# Import a Checkly project
yorker import --from checkly ./src/__checks__

# Import a Playwright test suite as browser monitors
yorker import --from playwright ./tests --output ./yorker/
```

Watch the [changelog](/changelog) for release dates. Until the importers land, the [comparison pages](/compare) document the migration story for each source.

---

### `yorker validate`

Validate your config without deploying. Runs the full load, interpolate, and validate pipeline without contacting the API.

```bash
yorker validate
```

Validation includes YAML syntax, Zod schema validation, secret interpolation, browser script file existence, frequency range checks, name uniqueness, SLO/alert cross-references, channel resolution, and maintenance-window time ranges.

```
  Config is valid: 5 monitor(s), 2 SLO(s)
```

On failure, all errors are collected:

```
3 config error(s):

  monitors[Homepage].httpConfig.url: Invalid url
  monitors[Login Flow]: cannot read script file "./monitors/missing.ts"
  slos[API SLO]: references monitor "Nonexistent" which does not exist
```

---

### `yorker status`

Show a status overview of all monitors.

```bash
yorker status [--watch] [--interval <duration>]
```

| Flag | Description |
|---|---|
| `--watch` | Continuously refresh. In an interactive TTY the screen is cleared and redrawn in place. In `--json` mode each cycle emits a single newline-delimited JSON object. When stdout is piped (non-TTY, non-JSON), each snapshot is appended with a `--- Status @ <ISO-timestamp> ---` header. Press `Ctrl+C` to exit. |
| `--interval <duration>` | Refresh interval for `--watch` (default: `30s`, range: `5s` to `1h`). Ignored without `--watch`. |

Displays each monitor with its enabled/paused state, type, frequency, health, and recent-result sparkline. Exits with code `10` when any monitor is unhealthy.

---

### `yorker dashboard`

Open an interactive full-screen status dashboard in your terminal. Polls the API on a configurable interval and renders monitor health, recent results, and sparklines.

```bash
yorker dashboard [--interval <duration>]
```

| Flag | Description |
|---|---|
| `--interval <duration>` | Refresh interval (default: `30s`, range: `5s` to `1h`). |

**Keybindings** (list view):

| Key | Action |
|---|---|
| `↑` / `k` | Move selection up |
| `↓` / `j` | Move selection down |
| `Enter` | Open detail view for the selected monitor |
| `/` | Filter monitors by name |
| `r` | Refresh now (outside the polling interval) |
| `q` / `Ctrl+C` | Quit |

In the detail view, press `Esc` to return to the list, `q` to quit, or `r` to refresh.

This is the interactive TUI (singular `dashboard`). For installing HyperDX / ClickStack dashboard packs, see `yorker dashboards install` below.

---

### `yorker dashboards install`

Install pre-built Yorker dashboards into HyperDX (self-hosted) or ClickStack Cloud. See [Install Dashboards](/docs/guides/install-dashboards) for full setup.

```bash
# Self-hosted HyperDX
yorker dashboards install --hyperdx-api-key <your-key>

# ClickStack Cloud
yorker dashboards install --mode clickstack_cloud \
  --clickstack-org-id <org-uuid> \
  --clickstack-service-id <service-uuid> \
  --clickstack-api-key-id <key-id> \
  --clickstack-api-secret <key-secret>
```

| Flag | Description |
|---|---|
| `--mode <mode>` | `self_hosted` (default) or `clickstack_cloud` (env: `HYPERDX_MODE`) |
| `--hyperdx-api-key <key>` | HyperDX API key for self-hosted mode (env: `HYPERDX_API_KEY`) |
| `--hyperdx-url <url>` | Override HyperDX API URL for self-hosted (default: `https://www.hyperdx.io`) |
| `--clickstack-org-id <id>` | ClickHouse Cloud organization ID (env: `CLICKSTACK_ORG_ID`) |
| `--clickstack-service-id <id>` | ClickHouse Cloud service ID (env: `CLICKSTACK_SERVICE_ID`) |
| `--clickstack-api-key-id <id>` | ClickHouse Cloud API key ID (env: `CLICKSTACK_API_KEY_ID`) |
| `--clickstack-api-secret <secret>` | ClickHouse Cloud API key secret (env: `CLICKSTACK_API_SECRET`) |
| `--packs <list>` | Comma-separated subset: `overview`, `deep-dive`, `status-page`, `trace-correlation` |
| `--force` | Skip duplicate check (may create additional copies) |

---

### `yorker log`

View config change history. Shows what changed, when, and how.

```bash
yorker log [--type <type>] [--name <name>] [--source <source>] [--limit <n>] [--offset <n>]
```

| Flag | Description |
|---|---|
| `--type <type>` | Filter by resource type (`check`, `alert`, `slo`, `channel`). |
| `--name <name>` | Filter by resource name (case-insensitive substring match). |
| `--source <source>` | Filter by change source (`ui`, `yaml`, `api`). |
| `--limit <n>` | Number of changes to show (default: 20, max: 200). |
| `--offset <n>` | Skip first N changes for pagination (cannot be used together with `--name`). |

```
  Config changes (showing 5):

  Time     Action  Type     Name             Source
  2m ago   create  check    api-health       yaml
  5m ago   update  alert    api-health:p95   ui
  1h ago   delete  channel  old-slack        yaml
  2h ago   update  slo      availability     api
  3h ago   create  check    login-flow       yaml
```

Use `--json` for machine-readable output in CI pipelines.

---

### `yorker test`

Run HTTP monitors locally against their configured URLs. Uses the same config pipeline as deploy, so all defaults, auth, and secrets are applied identically.

```bash
yorker test
```

```
Running monitors locally...

  HTTP Homepage ... 200 (142ms)
  HTTP Users API ... 200 (89ms)
  Browser Login Flow (4 steps)
    1. Navigate
    2. Log in
    3. Dashboard loads
    4. Check account
    Browser monitors require remote execution — use `yorker deploy` then check results
  MCP Docs Server
    MCP monitors require remote execution — use `yorker deploy` then check results
```

Only HTTP monitors are exercised locally. Browser and MCP monitors are listed (browser monitors also display extracted step markers) but skipped — deploy them and view results remotely.

---

### `yorker completions [shell]`

Generate shell completion scripts. Auto-detects your shell if the argument is omitted.

```bash
yorker completions        # auto-detect
yorker completions bash
yorker completions zsh
yorker completions fish
```

Pipe the output to the appropriate file for your shell. For example:

```bash
yorker completions zsh > ~/.zfunc/_yorker
```

---

## Monitor Commands

### `yorker monitors list`

List all monitors. The output includes each monitor's name, type, frequency, location count, and any labels attached to the check.

```bash
yorker monitors list [--type <type>] [--status <status>]
```

| Flag | Description |
|---|---|
| `--type <type>` | Filter by type: `http`, `browser`, or `mcp`. |
| `--status <status>` | Filter by status: `enabled` or `paused`. |

In `--json` mode, each monitor entry includes a `labels` array (omitted when label lookup failed, e.g. due to permissions or a transient API error).

---

### `yorker monitors get <name-or-id>`

Show detailed information about a monitor, including recent results.

```bash
yorker monitors get "Homepage"
yorker monitors get chk_abc123
```

Displays monitor details (ID, type, status, frequency, locations, URL or endpoint) and a table of recent results with response times and status codes.

---

### `yorker monitors create`

Create a new HTTP monitor imperatively. Browser and MCP monitors must be created via `yorker deploy` from YAML.

```bash
yorker monitors create --name "API Health" --type http --url https://api.example.com/health [--method GET] [--frequency 5m] [--locations loc_us_east,loc_eu_central]
```

| Flag | Required | Description |
|---|---|---|
| `--name <name>` | Yes | Monitor name. |
| `--type <type>` | Yes | Must be `http`. |
| `--url <url>` | Yes | URL to monitor. |
| `--method <method>` | No | HTTP method (default: `GET`). |
| `--frequency <freq>` | No | Check frequency (default: `5m`). |
| `--locations <ids>` | No | Comma-separated location IDs (default: all active hosted locations). |

---

### `yorker monitors edit <name-or-id>`

Edit an existing monitor.

```bash
yorker monitors edit "Homepage" --frequency 1m --add-location loc_eu_central
```

| Flag | Description |
|---|---|
| `--name <name>` | New monitor name. |
| `--frequency <freq>` | New frequency. |
| `--add-location <id>` | Add a location. |
| `--remove-location <id>` | Remove a location. |
| `--yes` | Skip confirmation prompt. |

---

### `yorker monitors delete <name-or-id>`

Delete a monitor and all its results and alert rules.

```bash
yorker monitors delete "Old Endpoint" [--yes]
```

Requires confirmation unless `--yes` is passed.

---

### `yorker monitors pause <name-or-id>`

Pause a running monitor.

```bash
yorker monitors pause "Homepage"
```

---

### `yorker monitors resume <name-or-id>`

Resume a paused monitor.

```bash
yorker monitors resume "Homepage"
```

---

### `yorker monitors analyze <name-or-id>`

Run a deep insights analysis on a monitor. Sends recent results and context to the Yorker insights engine and returns a classified summary (type, severity, confidence, summary text).

```bash
yorker monitors analyze "Homepage"
yorker monitors analyze chk_abc123
yorker monitors analyze chk_abc123 --json
```

Output in human mode:

```
  Type        degradation
  Severity    warning
  Confidence  0.87
  Summary     Response times increased 3x over the last hour across US-East,
              correlated with an upstream dependency slowdown.
  Model       claude-sonnet-4-20250514
  Generated   Apr 9, 2026, 3:23:04 PM UTC

  Deep analysis usage: 2/5 used for this resource.
```

In `--json` mode, emits the full validated response envelope (`insight`, `stale`, `deepAnalysisCount`, `deepAnalysisLimit`) under the standard `{ ok: true, data: ... }` envelope.

Deep analysis is rate-limited per monitor (5-minute cooldown between runs, 5 lifetime calls per monitor) and per team (50 per month). When rate-limited, the command exits with code `3` and prints a message explaining how long to wait or when the monthly quota resets.

---

## Results Commands

### `yorker results list <monitor>`

List check results for a monitor.

```bash
yorker results list "Homepage" [--limit 20] [--status success] [--since 24h]
```

| Flag | Description |
|---|---|
| `--limit <n>` | Number of results (default: `20`, max: `200`). |
| `--offset <n>` | Skip first N results for pagination. Cannot be combined with `--status` or `--since`. |
| `--status <status>` | Filter: `success`, `failure`, `error`, `timeout`. Applied client-side. |
| `--since <duration>` | Only show results from this period (e.g. `1h`, `24h`, `7d`). Applied client-side. |

> `--offset` is server-side pagination; `--status` and `--since` are client-side filters. Mixing them would produce inconsistent page sizes, so the CLI rejects the combination. Use `--limit` + re-run with a later `--since` if you need to page through filtered results.

---

### `yorker results get <monitor> <result-id>`

Show full details for a specific check result.

```bash
yorker results get "Homepage" res_abc123
```

Includes timing breakdown (DNS, TLS, TTFB, content transfer), assertions (pass/fail), screenshots, Web Vitals (browser monitors), tool-call results (MCP monitors), network requests, and console errors.

---

### `yorker results tail <monitor>`

Live-stream new results as they arrive. Long-running command that polls at a configurable interval.

```bash
yorker results tail "Homepage" [--interval 30s]
```

| Flag | Description |
|---|---|
| `--interval <duration>` | Poll interval (default: `30s`, range: `5s` to `1h`). |

```
  Tailing Homepage (every 30s, Ctrl+C to stop)

  Apr 9, 2026, 12:34:56 PM UTC  ✓ 200      US East Coast   142ms
  Apr 9, 2026, 12:35:26 PM UTC  ✓ 200      EU Central      89ms
  Apr 9, 2026, 12:36:56 PM UTC  ✗ timeout  US East Coast   30000ms  connection timeout
```

Timestamps are formatted as localized US English strings using the team's configured timezone (defaults to UTC). In `--json` mode, outputs one JSON object per result (newline-delimited).

---

## Alert Commands

### `yorker alerts list`

List alert instances.

```bash
yorker alerts list [--all] [--monitor "Homepage"] [--state active,acknowledged]
```

| Flag | Description |
|---|---|
| `--all` | Include resolved and recovered alerts (default: active and acknowledged only). |
| `--monitor <name-or-id>` | Filter by monitor. |
| `--state <states>` | Comma-separated states: `active`, `acknowledged`, `recovered`, `resolved`. |

The command returns at most 100 alert instances per invocation (the most recent by start time). For larger historical windows, use `yorker alerts history` or the REST API with pagination.

---

### `yorker alerts ack <instance-id>`

Acknowledge an active alert.

```bash
yorker alerts ack ainst_abc123
```

---

### `yorker alerts resolve <instance-id>`

Manually resolve an alert.

```bash
yorker alerts resolve ainst_abc123
```

---

### `yorker alerts analyze <instance-id>`

Run a deep insights analysis on an alert instance. Sends context about the alert (monitor, recent results, correlations) to the Yorker insights engine and returns a classified summary with a type, severity, confidence score, and summary text.

```bash
yorker alerts analyze ainst_abc123
yorker alerts analyze ainst_abc123 --json
```

Output in human mode matches `yorker monitors analyze` — labeled fields for `Type`, `Severity`, `Confidence`, `Summary`, `Model`, and `Generated`, followed by a usage footer.

In `--json` mode, emits the full validated response envelope (`insight`, `stale`, `deepAnalysisCount`, `deepAnalysisLimit`) under the standard `{ ok: true, data: ... }` envelope.

Deep analysis is rate-limited per alert instance (5-minute cooldown, 5 lifetime calls per instance) and per team (50 per month). When rate-limited, the command exits with code `3` and prints a message explaining how long to wait or when the monthly quota resets.

---

### `yorker alerts history`

Show alert history.

```bash
yorker alerts history [--monitor "Homepage"] [--since 7d] [--limit 20]
```

| Flag | Description |
|---|---|
| `--monitor <name-or-id>` | Filter by monitor. |
| `--since <duration>` | Time window (e.g. `30m`, `24h`, `7d`). |
| `--limit <n>` | Number of alerts (default: `20`, max: `200`). |

---

### `yorker alerts rules list`

List alert rules across monitors.

```bash
yorker alerts rules list [--monitor "Homepage"]
```

---

### `yorker alerts rules create`

Create a new alert rule with a single condition. For multi-condition rules, use `yorker.config.yaml` and `yorker deploy`.

```bash
yorker alerts rules create --monitor "Homepage" --condition "consecutive_failures >= 3" --channel nch_abc123 [--name "homepage-down"] [--severity critical]
```

| Flag | Required | Description |
|---|---|---|
| `--monitor <name-or-id>` | Yes | Monitor to attach the rule to. |
| `--condition <condition>` | Yes | Alert condition (see formats below). |
| `--channel <id>` | Yes | Notification channel ID (`nch_...`). |
| `--name <name>` | No | Optional rule name. |
| `--severity <level>` | No | Severity for SSL/certificate conditions: `critical`, `warning`, or `info`. Ignored for non-certificate conditions. |

#### Condition formats

| Syntax | Type |
|---|---|
| `consecutive_failures >= 3` | Trigger after N consecutive failures. |
| `response_time > 5000` | Trigger if response time exceeds N ms. |
| `multi_location >= 2` | Trigger if N+ locations fail simultaneously. |
| `ssl_expiry < 14` | Trigger if SSL cert expires within N days. |
| `ssl_certificate_changed` | Trigger when the leaf certificate fingerprint changes between runs. |
| `ssl_self_signed` | Trigger when a self-signed or untrusted-root certificate is detected. |
| `ssl_protocol < TLSv1.3` | Trigger when the TLS handshake negotiates a protocol older than the minimum. Allowed minimums: `TLSv1.2`, `TLSv1.3`. |

---

## Location Commands

The `yorker locations` command group manages private locations and runner keys.

### `yorker locations list`

List your private locations. By default only private locations are shown; pass `--all` to include the 14 hosted locations as well.

```bash
yorker locations list          # private locations only
yorker locations list --all    # private + hosted
```

| Flag | Description |
|---|---|
| `--all` | Include hosted locations in the output. |

Shows each location's ID, type (`hosted` / `private`), display name, region, health status, and (for private locations) last heartbeat.

---

### `yorker locations create`

Create a private location and auto-provision an initial runner key.

```bash
yorker locations create --name <slug> --display-name "<name>" [--region <id>]
```

| Flag | Required | Description |
|---|---|---|
| `--name <slug>` | Yes | Unique lowercase slug (e.g. `staging-eu`). |
| `--display-name <name>` | Yes | Human-readable display name. |
| `--region <id>` | No | Region identifier (default: `private`). |

The runner key secret is printed **once** — save it immediately. The command also prints a ready-to-run Docker snippet for starting the runner.

---

### `yorker locations delete <location-id>`

Delete a private location. Fails if any monitors still reference it. Also revokes every runner key associated with the location. The argument is the location ID (`loc_...`).

```bash
yorker locations delete loc_staging_eu [--yes]
```

---

### `yorker locations keys list <location-id>`

List runner keys for a private location. Only the key ID, name, prefix, and last-used time are returned — the secret is never exposed again after creation.

```bash
yorker locations keys list loc_staging_eu
```

---

### `yorker locations keys create <location-id>`

Provision a new runner key for a private location. The secret is printed **once**, followed by a ready-to-run Docker snippet.

```bash
yorker locations keys create loc_staging_eu [--name "<label>"]
```

| Flag | Description |
|---|---|
| `--name <label>` | Optional label for the key (useful when a location has multiple runners). |

---

### `yorker locations keys revoke <location-id> <key-id>`

Revoke a runner key by its ID (`rkey_...`). Any runners using the revoked key lose access immediately.

```bash
yorker locations keys revoke loc_staging_eu rkey_abc123 [--yes]
```

---

## CI/CD Integration

Validate on push, preview changes on PRs, deploy on merge. See the full [CI/CD Integration guide](/docs/guides/ci-cd) for complete GitHub Actions and GitLab CI workflows with PR commenting and JSON output parsing.

```yaml
# GitHub Actions — quick start
- run: npm install -g @yorker/cli
- run: yorker validate                          # every push
- run: yorker diff                              # PR preview
- run: yorker deploy --force                    # merge to main (CI owns the config)
  if: github.ref == 'refs/heads/main'
```

Use `--force` in automated pipelines where the config file is the source of truth. Use `--accept-remote` if you want CI to skip resources that were edited manually.

Set `YORKER_API_KEY` and any `YORKER_SECRET_*` variables at the job or workflow level so all steps — including `validate` — can resolve `{{secrets.*}}` placeholders.

---

## Command Tree

```
yorker
├── init [--name] [--url] [--type] [--frequency] [--format file|directory] [--force]
├── deploy [--dry-run] [--prune] [--force] [--accept-remote] [--wait]
├── diff
├── pull [--output] [--format file|directory] [--monitor] [--force]
├── validate
├── status [--watch] [--interval]
├── test
├── dashboard [--interval]               # interactive TUI
├── dashboards
│   └── install [--mode] [--hyperdx-api-key] [--clickstack-*] [--packs]
├── log [--type] [--name] [--source] [--limit] [--offset]
├── completions [bash|zsh|fish]
├── import --from <source> [--output] [path]  # coming soon
├── monitors
│   ├── list [--type] [--status]
│   ├── get <name-or-id>
│   ├── create --name --type [--url] [--method] [--frequency] [--locations]
│   ├── edit <name-or-id> [--name] [--frequency] [--add-location] [--remove-location] [--yes]
│   ├── delete <name-or-id> [--yes]
│   ├── pause <name-or-id>
│   └── resume <name-or-id>
├── results
│   ├── list <monitor> [--limit] [--offset] [--status] [--since]
│   ├── get <monitor> <result-id>
│   └── tail <monitor> [--interval]
├── alerts
│   ├── list [--all] [--monitor] [--state]
│   ├── ack <instance-id>
│   ├── resolve <instance-id>
│   ├── history [--monitor] [--since] [--limit]
│   └── rules
│       ├── list [--monitor]
│       └── create --monitor --condition --channel [--name] [--severity]
└── locations
    ├── list [--all]
    ├── create --name --display-name [--region]
    ├── delete <location-id> [--yes]
    └── keys
        ├── list <location-id>
        ├── create <location-id> [--name]
        └── revoke <location-id> <key-id> [--yes]
```


---

## REST API

> Complete reference for the Yorker REST API — all endpoints, methods, request/response schemas.


# REST API

The Yorker REST API provides programmatic access to monitors, alerts, SLOs, notification channels, and locations. All endpoints return JSON and validate request bodies through shared Zod schemas.

---

## Authentication

To authenticate API requests, include your API key in the `Authorization` header as a Bearer token.

```
Authorization: Bearer sk_...
```

All endpoints require authentication. Unauthenticated requests return `401 Unauthorized`.

---

## Base URL

```
https://app.yorkermonitoring.com
```

---

## Common Response Patterns

### Success

Successful responses return a `2xx` status code with a JSON body.

### Validation Error (400)

```json
{
  "error": "Validation failed",
  "details": {
    "fieldErrors": { "name": ["String must contain at least 1 character(s)"] },
    "formErrors": []
  }
}
```

### Not Found (404)

```json
{ "error": "Check not found" }
```

### Conflict (409)

```json
{ "error": "Channel is in use by alert rule \"my-alert\". Remove it from those first." }
```

---

## Checks

### List Checks

To list all checks for your team:

```
GET /api/checks
```

**Response** `200`

```json
{
  "checks": [
    {
      "id": "chk_abc123",
      "teamId": "team_xyz",
      "name": "Homepage",
      "type": "http",
      "configJson": { "url": "https://example.com", "method": "GET", "timeoutMs": 30000, "followRedirects": true, "maxRedirects": 5, "assertions": [] },
      "frequencySeconds": 300,
      "locations": ["loc_us_east", "loc_eu_central"],
      "enabled": true,
      "createdAt": "2025-01-15T10:00:00.000Z",
      "updatedAt": null
    }
  ]
}
```

### Create Check

To create a new check:

```
POST /api/checks
```

**Request body** -- validated through `CreateCheckSchema`

For HTTP checks:

| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `name` | `string` | Yes | -- | Check name (1-255 characters). |
| `type` | `"http"` | Yes | -- | Check type. |
| `enabled` | `boolean` | No | `true` | Whether the check is active. |
| `frequencySeconds` | `number` | No | `300` | Check interval (10-86400). |
| `locations` | `string[]` | Yes | -- | Location IDs (at least one). |
| `httpConfig` | `object` | Yes | -- | HTTP configuration (see below). |

`httpConfig` fields:

| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `url` | `string` | Yes | -- | Target URL (must be a valid URL). |
| `method` | `string` | No | `"GET"` | `GET` \| `POST` \| `PUT` \| `DELETE` \| `PATCH` \| `HEAD` |
| `headers` | `object` | No | -- | Request headers (string key-value pairs). |
| `body` | `string` | No | -- | Request body. |
| `auth` | `object` | No | -- | Auth config (`basic`, `bearer`, or `api-key`). |
| `followRedirects` | `boolean` | No | `true` | Follow HTTP redirects. |
| `maxRedirects` | `number` | No | `5` | Maximum number of redirects. |
| `timeoutMs` | `number` | No | `30000` | Request timeout in milliseconds. |
| `assertions` | `array` | No | `[]` | Assertion rules. See [Assertions](/docs/reference/assertions). |

For browser checks:

| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `name` | `string` | Yes | -- | Check name (1-255 characters). |
| `type` | `"browser"` | Yes | -- | Check type. |
| `enabled` | `boolean` | No | `true` | Whether the check is active. |
| `frequencySeconds` | `number` | No | `300` | Check interval (10-86400). |
| `locations` | `string[]` | Yes | -- | Location IDs (at least one). |
| `browserConfig` | `object` | Yes | -- | Browser configuration (see below). |

`browserConfig` fields:

| Field | Type | Required | Default | Constraints | Description |
|---|---|---|---|---|---|
| `browserMode` | `string` | Yes | — | `"scripted"` \| `"url"` | Execution mode. |
| `script` | `string` | Yes (scripted) | — | — | Playwright script content (for `"scripted"` mode). |
| `url` | `string` | Yes (url) | — | Valid URL | Target URL (for `"url"` mode). |
| `timeoutMs` | `number` | No | `30000` | 5000-120000 | Script timeout in milliseconds. |
| `viewport` | `object` | No | `{ width: 1280, height: 720 }` | — | Viewport dimensions. |
| `device` | `string` | No | — | — | Playwright device name for emulation. |
| `screenshotMode` | `string` | No | `"every_step"` | `every_step` \| `failure_only` \| `disabled` | Screenshot capture mode. |
| `videoEnabled` | `boolean` | No | `false` | — | Record video. |

For MCP checks:

| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `name` | `string` | Yes | — | Check name (1-255 characters). |
| `type` | `"mcp"` | Yes | — | Check type. |
| `enabled` | `boolean` | No | `true` | Whether the check is active. |
| `frequencySeconds` | `number` | No | `300` | Check interval (10-86400). |
| `locations` | `string[]` | Yes | — | Location IDs (at least one). |
| `mcpConfig` | `object` | Yes | — | MCP configuration (see below). |

`mcpConfig` fields:

| Field | Type | Required | Default | Constraints | Description |
|---|---|---|---|---|---|
| `endpoint` | `string` | Yes | — | Valid URL | Streamable HTTP endpoint of the MCP server. |
| `timeoutMs` | `number` | No | `30000` | 5000-120000 | Request timeout in milliseconds. |
| `auth` | `object` | No | — | `basic`, `bearer`, `api-key` | Same auth shape as HTTP checks. |
| `expectedTools` | `string[]` | No | — | — | Tool names that must be present. Missing tools fail the check. |
| `testCalls` | `array` | No | — | — | Tool invocations to exercise. Each entry has `toolName`, optional `arguments` (object), and optional `expectedOutputContains`. |
| `detectSchemaDrift` | `boolean` | No | `true` | — | Emit events when tool list or signatures change between runs. |

**Response** `201`

```json
{
  "check": {
    "id": "chk_abc123",
    "teamId": "team_xyz",
    "name": "Homepage",
    "type": "http",
    "configJson": { "url": "https://example.com", "method": "GET", "timeoutMs": 30000, "followRedirects": true, "maxRedirects": 5, "assertions": [] },
    "frequencySeconds": 300,
    "locations": ["loc_us_east"],
    "enabled": true,
    "createdAt": "2025-01-15T10:00:00.000Z",
    "updatedAt": null
  }
}
```

**Error** `403` -- Plan limit exceeded.

### Get Check

To get a single check with its 50 most recent results:

```
GET /api/checks/:id
```

**Response** `200`

```json
{
  "check": { "id": "chk_abc123", "name": "Homepage", "..." : "..." },
  "results": [
    {
      "id": "res_xyz789",
      "checkId": "chk_abc123",
      "runId": "run_abc",
      "locationId": "loc_us_east",
      "status": "success",
      "responseTimeMs": 142,
      "httpStatusCode": 200,
      "startedAt": "2025-01-15T10:05:00.000Z",
      "completedAt": "2025-01-15T10:05:00.142Z"
    }
  ]
}
```

### Update Check

To update an existing check:

```
PUT /api/checks/:id
```

**Request body** -- all fields are optional (partial update):

| Field | Type | Description |
|---|---|---|
| `name` | `string` | Check name. |
| `configJson` | `object` | Full HTTP or browser config object (replaces existing). |
| `frequencySeconds` | `number` | Check interval. |
| `locations` | `string[]` | Location IDs. |
| `enabled` | `boolean` | Whether the check is active. |

**Response** `200`

```json
{ "check": { "id": "chk_abc123", "..." : "..." } }
```

### Delete Check

To delete a check and its results:

```
DELETE /api/checks/:id
```

**Response** `200`

```json
{ "success": true }
```

### List Check Results

To get paginated results for a check:

```
GET /api/checks/:id/results
```

**Query parameters:**

| Parameter | Type | Default | Max | Description |
|---|---|---|---|---|
| `limit` | `number` | `50` | `200` | Number of results to return. |
| `offset` | `number` | `0` | -- | Pagination offset. |
| `detail` | `"true"` | -- | -- | Include Tier B debug data (network requests, console logs, screenshots). |

**Response** `200`

```json
{
  "results": [
    {
      "id": "res_xyz789",
      "checkId": "chk_abc123",
      "runId": "run_abc",
      "locationId": "loc_us_east",
      "status": "success",
      "responseTimeMs": 142,
      "httpStatusCode": 200,
      "timing": {
        "dnsLookupMs": 12,
        "tcpConnectMs": 18,
        "tlsHandshakeMs": 25,
        "ttfbMs": 80,
        "contentTransferMs": 7,
        "totalMs": 142
      },
      "startedAt": "2025-01-15T10:05:00.000Z",
      "completedAt": "2025-01-15T10:05:00.142Z"
    }
  ],
  "limit": 50,
  "offset": 0
}
```

When `detail=true`, browser check results also include `networkRequestsJson`, `screenshotsJson`, and `consoleLogsJson` fields.

### Trigger Ad-Hoc Run

To trigger an immediate check run across all assigned locations:

```
POST /api/checks/:id/trigger
```

**Request body:** none

**Response** `200`

```json
{
  "triggered": true,
  "locations": 3
}
```

---

## Alerts

### List Alert Rules

To list all alert rules for a check:

```
GET /api/checks/:id/alerts
```

**Response** `200`

```json
{
  "alerts": [
    {
      "id": "alert_abc123",
      "checkId": "chk_xyz",
      "name": "downtime-alert",
      "enabled": true,
      "conditions": [
        { "type": "consecutive_failures", "count": 3 }
      ],
      "channelIds": ["nch_abc"],
      "channels": [
        { "type": "slack", "webhookUrl": "https://hooks.slack.com/..." }
      ],
      "createdAt": "2025-01-15T10:00:00.000Z",
      "updatedAt": null
    }
  ]
}
```

The `channels` array contains resolved channel configurations for display. The `channelIds` array contains the raw channel IDs.

### Create Alert Rule

To create an alert rule for a check:

```
POST /api/checks/:id/alerts
```

**Request body** -- validated through `CreateAlertRuleSchema`:

| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `name` | `string` \| `null` | No | `null` | Alert rule name. |
| `enabled` | `boolean` | No | `true` | Whether the rule is active. |
| `conditions` | `array` | Yes | -- | At least one condition. See [Alert Conditions](#alert-conditions). |
| `channelIds` | `string[]` | Yes | -- | At least one notification channel ID. |

**Response** `201`

```json
{ "id": "alert_abc123" }
```

### Update Alert Rule

To update an existing alert rule:

```
PUT /api/alerts/:alertId
```

**Request body** -- validated through `UpdateAlertRuleSchema` (all fields optional):

| Field | Type | Description |
|---|---|---|
| `name` | `string` \| `null` | Alert rule name. |
| `enabled` | `boolean` | Whether the rule is active. |
| `conditions` | `array` | At least one condition (replaces existing). |
| `channelIds` | `string[]` | At least one channel ID (replaces existing). |

**Response** `200`

```json
{ "success": true }
```

### Delete Alert Rule

To delete an alert rule:

```
DELETE /api/alerts/:alertId
```

**Response** `200`

```json
{ "success": true }
```

### Alert Conditions

Each condition in the `conditions` array must have a `type` field. Available types:

| Type | Fields | Description |
|---|---|---|
| `consecutive_failures` | `count` (default: `2`, min: `1`) | Alert after N consecutive failures. |
| `response_time_threshold` | `maxMs` (required) | Alert when response time exceeds threshold. |
| `multi_location_failure` | `minLocations` (default: `2`, min: `2`), `windowSeconds` (default: `300`) | Alert when failures occur from multiple locations within a time window. |
| `ssl_expiry` | `daysBeforeExpiry` (default: `14`, min: `1`), `severity` (optional) | Alert when SSL certificate nears expiration. |
| `ssl_certificate_changed` | `severity` (optional) | Alert when the leaf certificate fingerprint changes between runs. |
| `ssl_self_signed` | `severity` (optional) | Alert when a self-signed or untrusted-root certificate is detected. |
| `ssl_protocol_deprecated` | `minProtocol` (default: `TLSv1.2`, allowed: `TLSv1.2`, `TLSv1.3`), `severity` (optional) | Alert when the TLS handshake negotiates a protocol older than `minProtocol`. |
| `burn_rate` | `sloId` (required), `burnRateThreshold` (positive number), `longWindowMinutes` (min `60`), `shortWindowMinutes` (min `5`, must be less than long) | SLO burn-rate alert. Typically generated automatically from SLOs with `burnRateAlertsEnabled: true`. |

All SSL conditions (including `ssl_expiry`) accept an optional `severity` field with value `critical`, `warning`, or `info`.

---

## Alert Instances

Alert instances represent individual occurrences of a triggered alert rule. They track state transitions through a lifecycle: `active` -> `acknowledged` -> `resolved` (or `recovered` automatically).

### List Alert Instances

To list alert instances for your team:

```
GET /api/alerts/instances
```

**Query parameters:**

| Parameter | Type | Default | Description |
|---|---|---|---|
| `state` | `string` | -- | Filter by state. Comma-separated: `active`, `acknowledged`, `recovered`, `resolved`. |
| `checkId` | `string` | -- | Filter by check ID. |
| `limit` | `number` | `50` | Results per page (1-200). |
| `offset` | `number` | `0` | Pagination offset. |

**Response** `200`

```json
{
  "instances": [
    {
      "id": "ainst_abc123",
      "ruleId": "alert_xyz",
      "checkId": "chk_xyz",
      "state": "active",
      "severity": "critical",
      "startedAt": "2025-01-15T10:00:00.000Z",
      "acknowledgedAt": null,
      "recoveredAt": null,
      "resolvedAt": null,
      "durationMs": null,
      "notificationCount": 2,
      "contextJson": {},
      "muted": false,
      "ruleName": "downtime-alert",
      "checkName": "Homepage"
    }
  ],
  "limit": 50,
  "offset": 0
}
```

### Get Alert Instance

To get a single alert instance with its full event timeline:

```
GET /api/alerts/instances/:id
```

**Response** `200`

```json
{
  "instance": {
    "id": "ainst_abc123",
    "ruleId": "alert_xyz",
    "checkId": "chk_xyz",
    "teamId": "team_xyz",
    "state": "acknowledged",
    "severity": "critical",
    "triggeringResultId": "res_abc",
    "triggeringTraceId": "abc123def456",
    "recoveryResultId": null,
    "startedAt": "2025-01-15T10:00:00.000Z",
    "acknowledgedAt": "2025-01-15T10:05:00.000Z",
    "acknowledgedBy": "user_abc",
    "recoveredAt": null,
    "resolvedAt": null,
    "durationMs": null,
    "notificationCount": 2,
    "contextJson": {},
    "muted": false,
    "ruleName": "downtime-alert",
    "checkName": "Homepage"
  },
  "events": [
    {
      "id": "aevt_abc",
      "instanceId": "ainst_abc123",
      "type": "triggered",
      "actorId": null,
      "createdAt": "2025-01-15T10:00:00.000Z"
    },
    {
      "id": "aevt_def",
      "instanceId": "ainst_abc123",
      "type": "acknowledged",
      "actorId": "user_abc",
      "createdAt": "2025-01-15T10:05:00.000Z"
    }
  ]
}
```

### Acknowledge Alert Instance

To acknowledge an active alert instance (transitions from `active` to `acknowledged`):

```
POST /api/alerts/instances/:id/acknowledge
```

**Request body:** none

**Response** `200`

```json
{ "success": true, "state": "acknowledged" }
```

**Error** `409` -- Instance is not in `active` state.

### Resolve Alert Instance

To manually resolve an alert instance (transitions from `active` or `acknowledged` to `resolved`):

```
POST /api/alerts/instances/:id/resolve
```

**Request body:** none

**Response** `200`

```json
{ "success": true, "state": "resolved", "durationMs": 300000 }
```

**Error** `409` -- Instance is already `recovered` or `resolved`.

---

## Notification Channels

### List Channels

To list all notification channels for your team:

```
GET /api/notification-channels
```

**Response** `200`

```json
{
  "channels": [
    {
      "id": "nch_abc123",
      "name": "ops-slack",
      "type": "slack",
      "config": { "type": "slack", "webhookUrl": "https://hooks.slack.com/..." },
      "createdAt": "2025-01-15T10:00:00.000Z",
      "updatedAt": null
    }
  ]
}
```

### Create Channel

To create a notification channel:

```
POST /api/notification-channels
```

**Request body** -- validated through `CreateNotificationChannelSchema`:

| Field | Type | Required | Description |
|---|---|---|---|
| `name` | `string` | Yes | Channel name (1-100 characters). Must be unique within the team. |
| `channel` | `object` | Yes | Channel configuration. |

`channel` object (discriminated by `type`):

**Slack:**

| Field | Type | Required | Description |
|---|---|---|---|
| `type` | `"slack"` | Yes | Channel type. |
| `webhookUrl` | `string` (URL) | Yes | Slack incoming webhook URL. |

**Email:**

| Field | Type | Required | Description |
|---|---|---|---|
| `type` | `"email"` | Yes | Channel type. |
| `addresses` | `string[]` | Yes | At least one valid email address. |

**Webhook:**

| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `type` | `"webhook"` | Yes | -- | Channel type. |
| `url` | `string` (URL) | Yes | -- | Webhook endpoint URL. |
| `method` | `"POST"` \| `"PUT"` | No | `"POST"` | HTTP method. |
| `headers` | `object` | No | -- | Custom headers. |

**Response** `201`

```json
{
  "channel": {
    "id": "nch_abc123",
    "name": "ops-slack",
    "type": "slack",
    "config": { "type": "slack", "webhookUrl": "https://hooks.slack.com/..." }
  }
}
```

**Error** `409` -- A channel with that name already exists.

### Get Channel

To get a single notification channel:

```
GET /api/notification-channels/:id
```

**Response** `200`

```json
{
  "channel": {
    "id": "nch_abc123",
    "name": "ops-slack",
    "type": "slack",
    "config": { "type": "slack", "webhookUrl": "https://hooks.slack.com/..." },
    "createdAt": "2025-01-15T10:00:00.000Z",
    "updatedAt": null
  }
}
```

### Update Channel

To update a notification channel:

```
PUT /api/notification-channels/:id
```

**Request body** -- validated through `UpdateNotificationChannelSchema` (all fields optional):

| Field | Type | Description |
|---|---|---|
| `name` | `string` | Channel name (1-100 characters). |
| `channel` | `object` | Channel configuration. The `type` field cannot change -- delete and recreate to change type. |

**Response** `200`

```json
{ "success": true }
```

**Error** `400` -- Channel type cannot be changed.
**Error** `409` -- A channel with that name already exists.

### Delete Channel

To delete a notification channel:

```
DELETE /api/notification-channels/:id
```

The channel must not be referenced by any alert rules or SLOs. Remove references first.

**Response** `200`

```json
{ "success": true }
```

**Error** `409` -- Channel is still in use by an alert rule or SLO.

---

## SLOs

### List SLOs

To list all SLO definitions for your team:

```
GET /api/slos
```

**Query parameters:**

| Parameter | Type | Default | Description |
|---|---|---|---|
| `checkId` | `string` | -- | Filter by check ID. |
| `sloType` | `string` | -- | Filter by SLO type: `check` or `third_party`. |
| `limit` | `number` | `50` | Results per page (1-200). |
| `offset` | `number` | `0` | Pagination offset. |

**Response** `200`

```json
{
  "slos": [
    {
      "id": "slo_abc123",
      "checkId": "chk_xyz",
      "teamId": "team_xyz",
      "name": "Homepage SLO",
      "targetBasisPoints": 9990,
      "windowDays": 30,
      "burnRateAlertsEnabled": true,
      "channelIds": ["nch_abc"],
      "enabled": true,
      "sloType": "check",
      "patternId": null,
      "sliType": "availability",
      "perfThresholdMs": null,
      "scope": "check",
      "createdAt": "2025-01-15T10:00:00.000Z",
      "updatedAt": null,
      "checkName": "Homepage"
    }
  ],
  "limit": 50,
  "offset": 0
}
```

### Create SLO

To create a new SLO definition:

```
POST /api/slos
```

**Request body** -- validated through `CreateSloSchema` (discriminated union on `sloType`):

For check-based SLOs:

| Field | Type | Required | Default | Constraints | Description |
|---|---|---|---|---|---|
| `sloType` | `"check"` | Yes | -- | -- | SLO type. Defaults to `"check"` if omitted. |
| `checkId` | `string` | Yes | -- | -- | Check ID this SLO tracks. |
| `name` | `string` | Yes | -- | 1-255 characters | SLO name. |
| `targetBasisPoints` | `number` | Yes | -- | Integer, 1-9999 | Availability target (e.g., 9990 = 99.90%). |
| `windowDays` | `number` | Yes | -- | `7` \| `14` \| `30` | Rolling window in days. |
| `burnRateAlertsEnabled` | `boolean` | No | `true` | -- | Enable burn rate alerting. |
| `channelIds` | `string[]` | No | -- | -- | Notification channel IDs for alerts. |
| `enabled` | `boolean` | No | `true` | -- | Whether the SLO is active. |

**Response** `201`

```json
{
  "slo": {
    "id": "slo_abc123",
    "checkId": "chk_xyz",
    "teamId": "team_xyz",
    "name": "Homepage SLO",
    "targetBasisPoints": 9990,
    "windowDays": 30,
    "burnRateAlertsEnabled": true,
    "channelIds": [],
    "enabled": true,
    "sloType": "check",
    "patternId": null,
    "sliType": "availability",
    "perfThresholdMs": null,
    "scope": "check",
    "createdAt": "2025-01-15T10:00:00.000Z",
    "updatedAt": null
  }
}
```

### Get SLO

To get a single SLO definition:

```
GET /api/slos/:id
```

**Response** `200`

```json
{
  "slo": {
    "id": "slo_abc123",
    "checkId": "chk_xyz",
    "name": "Homepage SLO",
    "targetBasisPoints": 9990,
    "windowDays": 30,
    "burnRateAlertsEnabled": true,
    "channelIds": ["nch_abc"],
    "enabled": true,
    "sloType": "check",
    "checkName": "Homepage",
    "..." : "..."
  }
}
```

### Update SLO

To update an existing SLO definition:

```
PUT /api/slos/:id
```

**Request body** -- validated through `UpdateSloSchema` (all fields optional):

| Field | Type | Constraints | Description |
|---|---|---|---|
| `name` | `string` | 1-255 characters | SLO name. |
| `checkId` | `string` | -- | Check ID (cannot be set on team-wide SLOs). |
| `targetBasisPoints` | `number` | Integer, 1-9999 | Availability target. |
| `windowDays` | `number` | `7` \| `14` \| `30` | Rolling window. |
| `burnRateAlertsEnabled` | `boolean` | -- | Enable burn rate alerting. |
| `channelIds` | `string[]` | -- | Notification channel IDs. |
| `enabled` | `boolean` | -- | Whether the SLO is active. |
| `perfThresholdMs` | `number` | Integer, min 1. Only valid for performance SLIs. | Performance threshold. |

Note: `sloType`, `scope`, and `sliType` are immutable after creation. Changing them would invalidate historical data.

**Response** `200`

```json
{
  "slo": { "id": "slo_abc123", "..." : "..." }
}
```

### Delete SLO

To delete an SLO definition:

```
DELETE /api/slos/:id
```

**Response** `200`

```json
{ "success": true }
```

### Get SLO Status

To get the current computed state of an SLO (availability, error budget, burn rates):

```
GET /api/slos/:id/status
```

**Response** `200`

```json
{
  "state": {
    "sloId": "slo_abc123",
    "checkId": "chk_xyz",
    "patternId": null,
    "name": "Homepage SLO",
    "sloType": "check",
    "sliType": "availability",
    "targetBasisPoints": 9990,
    "windowDays": 30,
    "availability": 0.9995,
    "totalCount": 8640,
    "successCount": 8636,
    "avgDurationMs": null,
    "perfComplianceRatio": null,
    "errorBudgetTotal": 43.2,
    "errorBudgetConsumed": 21.6,
    "errorBudgetRemaining": 21.6,
    "budgetConsumedRatio": 0.5,
    "burnRate1h": 0.0,
    "burnRate6h": 0.2,
    "burnRate24h": 0.5
  }
}
```

| Field | Type | Description |
|---|---|---|
| `availability` | `number` | Current availability ratio (0-1). |
| `totalCount` | `number` | Total check runs in the window. |
| `successCount` | `number` | Successful check runs. |
| `errorBudgetTotal` | `number` | Total error budget in estimated minutes. |
| `errorBudgetConsumed` | `number` | Error budget consumed in estimated minutes. |
| `errorBudgetRemaining` | `number` | Error budget remaining in estimated minutes. |
| `budgetConsumedRatio` | `number` | Fraction of budget consumed (0-1). |
| `burnRate1h` | `number` | 1-hour burn rate. |
| `burnRate6h` | `number` | 6-hour burn rate. |
| `burnRate24h` | `number` | 24-hour burn rate. |

---

## Locations

### List Locations

To list all available monitoring locations (hosted and private):

```
GET /api/locations
```

**Query parameters:**

| Parameter | Type | Default | Description |
|---|---|---|---|
| `include_deprecated` | `"true"` | -- | Also include deprecated locations (for migration UI). |

**Response** `200`

```json
{
  "locations": [
    {
      "id": "loc_us_east",
      "name": "us-east",
      "region": "iad",
      "displayName": "US East (Ashburn)",
      "type": "hosted",
      "status": "active",
      "health": "active",
      "latitude": 39.0438,
      "longitude": -77.4874
    },
    {
      "id": "loc_staging_eu",
      "name": "staging-eu",
      "region": "private",
      "displayName": "Staging EU",
      "type": "private",
      "status": "active",
      "health": "active",
      "teamId": "team_xyz",
      "lastHeartbeat": "2025-01-15T10:03:45.000Z"
    }
  ]
}
```

| Field | Type | Description |
|---|---|---|
| `id` | `string` | Location ID (use this in check `locations` arrays). |
| `name` | `string` | Short name. |
| `region` | `string` | Fly region code (hosted) or `"private"`. |
| `displayName` | `string` | Human-readable name. |
| `type` | `string` | `"hosted"` or `"private"`. |
| `status` | `string` | Lifecycle status: `"active"`, `"deprecated"`, or `"retired"`. |
| `health` | `string` | Runtime health: `"active"`, `"degraded"`, or `"offline"`. Hosted locations are always `"active"`; private locations are derived from heartbeat recency. |
| `replacedBy` | `string` | Successor location ID (only for deprecated/retired locations). |
| `teamId` | `string` | Only set for private locations. |
| `lastHeartbeat` | `string` (ISO-8601) | Only set for private locations. Most recent runner check-in. |
| `latitude` | `number` | Geographic latitude (optional). |
| `longitude` | `number` | Geographic longitude (optional). |

---

## NL Generation

`POST /api/checks/generate` has two modes, distinguished by whether the request body includes a `spec` field:

- **Playwright mode** — default. Generates a Playwright browser script from a natural language description via Claude. Used by the dashboard's NL monitor builder.
- **Spec mode** — generates HTTP checks from an OpenAPI document by reusing the same pipeline as `POST /api/specs/:id/generate-checks`. The spec can be referenced by ID, fetched from a URL, or matched by name against an existing spec.

### Generate Playwright Script

To generate a Playwright monitoring script from a natural language description:

```
POST /api/checks/generate
```

**Request body:**

| Field | Type | Required | Description |
|---|---|---|---|
| `description` | `string` | Yes | Natural language description of what to monitor (min 10 characters). |
| `targetUrl` | `string` (URL) | No | Target URL for the script. |
| `previousScript` | `string` | No | Existing script to refine. |
| `refinement` | `string` | No | Refinement instructions (used with `previousScript`). |

**Response** `200`

```json
{
  "mode": "playwright",
  "script": "// @step: Navigate to homepage\nawait page.goto('https://example.com');\n...",
  "description": "Monitor the login flow",
  "model": "claude-sonnet-4-20250514"
}
```

**Error** `503` — NL creation not configured (server missing `ANTHROPIC_API_KEY`).

### Generate Checks From OpenAPI Spec

To generate HTTP checks from an OpenAPI spec in a single round trip — useful when an integration knows the spec source but does not want to manage the spec entity separately:

```
POST /api/checks/generate
```

**Request body:**

| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `spec` | `object` | Yes | — | Discriminated by `spec.source`. See variants below. |
| `locations` | `string[]` | Yes | — | Locations to assign to generated checks. Must contain at least one. |
| `frequencySeconds` | `integer` | No | `300` | Check frequency in seconds, between `10` and `86400`. |
| `validateHeaders` | `boolean` | No | `false` | Also validate response headers against the spec (toggles `validateHeaders` on the generated `openapi_conformance` assertion). |
| `confirm` | `boolean` | No | `false` | Required when the spec yields more than 50 *new* operations. |

**Spec reference variants** — `spec.source` selects how the spec is resolved:

| `source` | Required fields | Optional fields | Behavior |
|---|---|---|---|
| `"id"` | `specId` | — | Uses the spec already stored on the team. Returns `404` if not found. |
| `"url"` | `specUrl` (HTTP or HTTPS URL) | `name` (1–255 chars) | Fetches the spec via the SSRF-guarded loader (blocks private/reserved IPs and credential-bearing URLs; plain HTTP is permitted for internal networks, redirects are rejected to prevent SSRF). If a spec with the same content hash already exists for the team, it is reused; otherwise a new spec entity is inserted. When `name` is omitted it is derived from `info.title`, falling back to `api-{hostname}`, with a numeric suffix appended on collision. |
| `"name"` | `specName` | — | Looks up an existing spec by exact name (per-team unique). Returns `404` if not found. |

**Response** `200` (or `201` if at least one check was created) — same envelope as [`POST /api/specs/:id/generate-checks`](#generate-checks-from-spec). The `mode` field marks this as the spec branch and `spec.newlyCreated` indicates whether the URL variant inserted a brand-new spec row.

```json
{
  "mode": "spec",
  "spec": { "id": "spec_abc", "name": "Petstore API", "newlyCreated": true },
  "created": [
    { "id": "chk_xyz", "name": "GET /pets", "operationKey": "GET /pets" }
  ],
  "skipped": [
    { "operationKey": "GET /pets/{id}", "reason": "already_exists" }
  ],
  "summary": {
    "operationsInSpec": 12,
    "eligible": 11,
    "created": 10,
    "skipped": 1,
    "labelAttachmentFailures": 0
  }
}
```

**Errors:**

- `400` — invalid request body (`{ error: "Validation failed", details: {...} }`).
- `403` — plan limit reached. Body contains `{ error: "<plan limit message>" }`.
- `404` — spec not found (`id` and `name` variants only).
- `409` — confirmation required: `{ error: "This spec would create N new monitors. Confirm to create all of them.", requiresConfirmation: true, operationCount: N, threshold: 50 }`. Re-submit with `confirm: true`.
- `409` — (`url` variant only) a spec with the resolved name already exists with different content. Pass an explicit `name` or use the `id` variant.
- `422` — the loaded spec content failed to parse, OR the upstream URL fetch returned an error / non-OpenAPI body / exceeded the 10 MB cap. Body contains `{ error: "<details>" }`.

---

## API Specs

API specs store OpenAPI documents that HTTP checks can validate against via the `openapi_conformance` assertion. See [Assertions → openapi_conformance](/docs/reference/assertions#openapi_conformance).

### List Specs

```
GET /api/specs
```

Returns a summary list (no full contents — use the detail endpoint to fetch the parsed spec).

### Create Spec

```
POST /api/specs
```

**Request body** — discriminated by `sourceType`:

**Upload mode:**

| Field | Type | Required | Description |
|---|---|---|---|
| `name` | `string` | Yes | Spec name (1-255 characters). |
| `sourceType` | `"upload"` | Yes | — |
| `content` | `string` | Yes | Raw OpenAPI JSON or YAML (up to 4 MB). |

**URL mode:**

| Field | Type | Required | Description |
|---|---|---|---|
| `name` | `string` | Yes | Spec name (1-255 characters). |
| `sourceType` | `"url"` | Yes | — |
| `sourceUrl` | `string` (URL) | Yes | URL Yorker fetches the spec from. |

### Get Spec

```
GET /api/specs/:id
```

Returns the full parsed spec including `contentJson`.

### Update Spec

```
PUT /api/specs/:id
```

Accepts `name`, `content` (switches to upload mode), or `sourceUrl` (switches to url mode). You cannot provide both `content` and `sourceUrl` in the same request.

### Delete Spec

```
DELETE /api/specs/:id
```

Fails if any checks still reference the spec via an `openapi_conformance` assertion.

### Sync URL-mode Spec

```
POST /api/specs/:id/sync
```

Forces Yorker to re-fetch a `url`-mode spec immediately. Runners invalidate their cached copy on the next poll.

### Generate Checks From Spec

```
POST /api/specs/:id/generate-checks
```

Generates one HTTP check per operation in the spec. Operations that already have a corresponding check are skipped — re-running this endpoint is idempotent. Returns `201` when at least one check is created, `200` otherwise.

**Request body:**

| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `locations` | `string[]` | Yes | — | Locations to assign to generated checks. Must contain at least one. |
| `frequencySeconds` | `integer` | No | `300` | Check frequency in seconds, between `10` and `86400`. |
| `validateHeaders` | `boolean` | No | `false` | Also validate response headers against the spec (toggles `validateHeaders` on the generated `openapi_conformance` assertion). |
| `confirm` | `boolean` | No | `false` | Required when the spec yields more than 50 *new* operations. Re-submit with `true` to proceed. |

**Response** `201` / `200`

```json
{
  "spec": { "id": "spec_abc", "name": "Petstore API" },
  "created": [
    { "id": "chk_xyz", "name": "GET /pets", "operationKey": "GET /pets" }
  ],
  "skipped": [
    { "operationKey": "GET /pets/{id}", "reason": "already_exists" }
  ],
  "summary": {
    "operationsInSpec": 12,
    "eligible": 11,
    "created": 10,
    "skipped": 1,
    "labelAttachmentFailures": 0
  }
}
```

| Field | Type | Description |
|---|---|---|
| `spec` | `object` | The spec the checks were generated from (`id`, `name`). |
| `created` | `array` | Checks created in this call (`id`, `name`, `operationKey`). |
| `skipped` | `array` | Operations the generator skipped (`operationKey`, `reason`). Reasons include `already_exists`, `no_responses`, `no_server_url`, `invalid_url`, `deprecated`, and others — see [`GenerateChecksSkipReason`](https://github.com/yorker-monitoring/yorker/blob/main/packages/shared/src/schemas/spec.ts). |
| `summary.operationsInSpec` | `number` | Total operations the spec exposes. |
| `summary.eligible` | `number` | Operations that passed the per-operation filters. |
| `summary.created` | `number` | New checks created in this call. |
| `summary.skipped` | `number` | Operations skipped (existing + filtered). |
| `summary.labelAttachmentFailures` | `number` | Label attachments that failed mid-loop (extremely rare). |

**Errors:**

- `400` — invalid request body (`{ error: "Validation failed", details: {...} }`).
- `403` — plan limit reached. Body contains `{ error: "<plan limit message>" }`.
- `404` — spec not found (or belongs to another team).
- `409` — confirmation required: `{ error: "This spec would create N new monitors. Confirm to create all of them.", requiresConfirmation: true, operationCount: N, threshold: 50 }`. Re-submit with `confirm: true`.
- `422` — the stored spec content failed to parse (`{ error: "<parser message>" }`).

---

## Maintenance Windows

Maintenance windows silence or pause checks during scheduled work. See [Configuration → `maintenanceWindows`](/docs/reference/configuration#maintenancewindows) for the YAML schema.

### List Maintenance Windows

```
GET /api/maintenance-windows
```

### Create Maintenance Window

```
POST /api/maintenance-windows
```

**Request body:**

| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `name` | `string` | Yes | — | Window name (1-200 characters). |
| `mode` | `"pause"` \| `"continue"` | No | `"pause"` | `pause` stops running checks; `continue` runs them but silences notifications. |
| `checkIds` | `string[]` \| `null` | Yes | — | Array of check IDs to cover. Pass `null` to apply to **all** checks for the team. If provided as an array, it must contain at least one ID. |
| `startsAt` | `string` (ISO-8601) | Yes | — | Start timestamp. |
| `endsAt` | `string` (ISO-8601) | Yes | — | End timestamp. Must be after `startsAt`. |
| `recurring` | `boolean` | No | `false` | Enable recurrence. |
| `recurrenceRule` | `string` (RRULE) | Required if `recurring` | — | RFC 5545 recurrence rule. Supported: `FREQ=DAILY`, `FREQ=WEEKLY` (optionally with `BYDAY`), `FREQ=MONTHLY` (optionally with `BYMONTHDAY`). Recurring window duration cannot exceed 31 days. |

### Get / Update / Delete Maintenance Window

```
GET    /api/maintenance-windows/:id
PUT    /api/maintenance-windows/:id
DELETE /api/maintenance-windows/:id
```


---

## Assertions

> Complete reference for HTTP check assertions — all types, operators, and examples.


# Assertions

Assertions validate HTTP check responses against expected conditions. Each assertion is an object with a `type` field that determines the available properties. All assertion types are validated through `AssertionSchema`, a Zod discriminated union.

To add assertions to an HTTP monitor, include them in the `assertions` array within `httpConfig` (API) or the monitor definition (YAML).

**Inheritance rule:** Monitor-level assertions **replace** defaults entirely. They are **not merged** with `defaults.http.assertions`. This matches Terraform/Checkly semantics -- if you define any assertions on a monitor, all default assertions are ignored for that monitor.

---

## `status_code`

To assert on the HTTP response status code:

| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `type` | `"status_code"` | Yes | -- | Assertion type. |
| `operator` | `string` | No | `"equals"` | `equals` \| `not_equals` \| `less_than` \| `greater_than` |
| `value` | `number` | Yes | -- | Expected status code. |

### Operators

| Operator | Meaning |
|---|---|
| `equals` | Status code must exactly match `value`. |
| `not_equals` | Status code must not match `value`. |
| `less_than` | Status code must be less than `value`. |
| `greater_than` | Status code must be greater than `value`. |

### Examples

Assert an exact status code (operator defaults to `equals`):

```yaml
assertions:
  - type: status_code
    value: 200
```

Assert the status code is not a server error:

```yaml
assertions:
  - type: status_code
    operator: less_than
    value: 500
```

Assert a redirect:

```yaml
assertions:
  - type: status_code
    operator: equals
    value: 301
```

Exclude a specific status:

```yaml
assertions:
  - type: status_code
    operator: not_equals
    value: 404
```

---

## `response_time`

To assert that the response completes within a time limit:

| Field | Type | Required | Description |
|---|---|---|---|
| `type` | `"response_time"` | Yes | Assertion type. |
| `max` | `number` | Yes | Maximum response time in milliseconds. |

The assertion passes if the total response time is less than or equal to `max`.

### Examples

Response must complete within 3 seconds:

```yaml
assertions:
  - type: response_time
    max: 3000
```

Strict performance requirement (500ms):

```yaml
assertions:
  - type: response_time
    max: 500
```

---

## `body_contains`

To assert that the response body contains a specific string:

| Field | Type | Required | Description |
|---|---|---|---|
| `type` | `"body_contains"` | Yes | Assertion type. |
| `value` | `string` | Yes | String that must appear in the response body. |

The match is a simple substring check. It is case-sensitive.

### Examples

Check for a success indicator:

```yaml
assertions:
  - type: body_contains
    value: '"status":"ok"'
```

Check for HTML content:

```yaml
assertions:
  - type: body_contains
    value: "Welcome to Acme"
```

---

## `body_matches`

To assert that the response body matches a regular expression pattern:

| Field | Type | Required | Description |
|---|---|---|---|
| `type` | `"body_matches"` | Yes | Assertion type. |
| `pattern` | `string` | Yes | Regular expression pattern to match against the response body. |

The pattern is applied to the full response body. It uses standard JavaScript regex syntax.

### Examples

Match a version string:

```yaml
assertions:
  - type: body_matches
    pattern: '"version":"\\d+\\.\\d+\\.\\d+"'
```

Match any success status in JSON:

```yaml
assertions:
  - type: body_matches
    pattern: '"status":"(ok|healthy|running)"'
```

Match a UUID in the response:

```yaml
assertions:
  - type: body_matches
    pattern: '[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}'
```

---

## `body_json_path`

To assert on a specific value within a JSON response using JSONPath expressions:

| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `type` | `"body_json_path"` | Yes | -- | Assertion type. |
| `path` | `string` | Yes | -- | JSONPath expression (e.g., `$.status`, `$.data.items[0].name`). |
| `operator` | `string` | No | `"equals"` | `equals` \| `not_equals` \| `contains` \| `exists` |
| `value` | `any` | No | -- | Expected value. Not required when operator is `exists`. |

### Operators

| Operator | Meaning |
|---|---|
| `equals` | The value at the JSONPath must exactly match `value`. |
| `not_equals` | The value at the JSONPath must not match `value`. |
| `contains` | The value at the JSONPath (string) must contain `value` as a substring. |
| `exists` | The JSONPath must resolve to a non-undefined value. `value` is ignored. |

### Examples

Assert a JSON field has a specific value:

```yaml
assertions:
  - type: body_json_path
    path: "$.status"
    value: "healthy"
```

Assert an array has items (check existence):

```yaml
assertions:
  - type: body_json_path
    path: "$.data.items[0]"
    operator: exists
```

Assert a nested value is not null:

```yaml
assertions:
  - type: body_json_path
    path: "$.user.email"
    operator: not_equals
    value: null
```

Assert a string field contains a substring:

```yaml
assertions:
  - type: body_json_path
    path: "$.message"
    operator: contains
    value: "success"
```

Assert a numeric value:

```yaml
assertions:
  - type: body_json_path
    path: "$.count"
    operator: equals
    value: 42
```

---

## `header_value`

To assert on the value of a response header:

| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `type` | `"header_value"` | Yes | -- | Assertion type. |
| `header` | `string` | Yes | -- | Header name (case-insensitive matching at runtime). |
| `operator` | `string` | No | `"equals"` | `equals` \| `contains` \| `exists` |
| `value` | `string` | No | -- | Expected header value. Not required when operator is `exists`. |

### Operators

| Operator | Meaning |
|---|---|
| `equals` | Header value must exactly match `value`. |
| `contains` | Header value must contain `value` as a substring. |
| `exists` | Header must be present in the response. `value` is ignored. |

### Examples

Assert the content type:

```yaml
assertions:
  - type: header_value
    header: "Content-Type"
    value: "application/json"
```

Assert a header contains a substring:

```yaml
assertions:
  - type: header_value
    header: "Content-Type"
    operator: contains
    value: "json"
```

Assert a cache header exists:

```yaml
assertions:
  - type: header_value
    header: "X-Cache"
    operator: exists
```

Assert a security header:

```yaml
assertions:
  - type: header_value
    header: "Strict-Transport-Security"
    operator: exists
```

---

## `ssl_expiry`

To assert that the target's SSL certificate does not expire within a given number of days:

| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `type` | `"ssl_expiry"` | Yes | -- | Assertion type. |
| `daysBeforeExpiry` | `number` | No | `14` | Minimum number of days until certificate expiry. |

The assertion fails if the SSL certificate expires within `daysBeforeExpiry` days.

### Examples

Default 14-day warning:

```yaml
assertions:
  - type: ssl_expiry
```

30-day warning for critical services:

```yaml
assertions:
  - type: ssl_expiry
    daysBeforeExpiry: 30
```

7-day warning for non-critical services:

```yaml
assertions:
  - type: ssl_expiry
    daysBeforeExpiry: 7
```

---

## `openapi_conformance`

Validate that the HTTP response conforms to an OpenAPI spec registered in Yorker. Yorker auto-detects the operation by matching the check URL (method + path template) against the spec, or you can pin a specific operation.

| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `type` | `"openapi_conformance"` | Yes | — | Assertion type. |
| `specId` | `string` | Yes | — | ID of the API spec stored in Yorker (`spec_...`). |
| `operationPath` | `string` | No | auto-detect | Override operation as `"METHOD /path/template"` (e.g. `"GET /pets/{petId}"`). |
| `validateHeaders` | `boolean` | No | `false` | Also validate response headers against the spec. |

The assertion fails if the response status, body shape, or (when `validateHeaders: true`) headers do not match the OpenAPI operation. Schema drift between requests is detected by hashing the stored spec content — when you update the spec in Yorker, runners automatically invalidate their cached copy on the next check.

### Managing API specs

API specs are managed through the dashboard (**Monitors → API Specs**) or via the REST API:

- Upload raw JSON/YAML: `POST /api/specs` with `sourceType: "upload"` and `content`.
- Link a URL (Yorker fetches and refreshes on demand): `POST /api/specs` with `sourceType: "url"` and `sourceUrl`.

See the [REST API reference](/docs/reference/api) for full spec CRUD.

### Examples

Validate every response against a spec, auto-detecting the operation:

```yaml
assertions:
  - type: openapi_conformance
    specId: spec_payments_v2
```

Pin a specific operation and include header validation:

```yaml
assertions:
  - type: openapi_conformance
    specId: spec_payments_v2
    operationPath: "POST /v1/payments"
    validateHeaders: true
```

---

## Combining Assertions

To apply multiple assertions to a single monitor, include them all in the `assertions` array. All assertions must pass for the check to be considered successful.

```yaml
monitors:
  - name: "API Health"
    type: http
    url: "https://api.example.com/health"
    assertions:
      - type: status_code
        value: 200
      - type: response_time
        max: 2000
      - type: body_json_path
        path: "$.status"
        value: "ok"
      - type: header_value
        header: "Content-Type"
        operator: contains
        value: "json"
      - type: ssl_expiry
        daysBeforeExpiry: 14
```

---

## Assertion Results

When a check runs, each assertion produces a result object in the check result:

```json
{
  "type": "status_code",
  "passed": true,
  "expected": 200,
  "actual": 200,
  "message": null
}
```

| Field | Type | Description |
|---|---|---|
| `type` | `string` | The assertion type. |
| `passed` | `boolean` | Whether the assertion passed. |
| `expected` | `any` | The expected value from the assertion definition. |
| `actual` | `any` | The actual value from the response. |
| `message` | `string` | Human-readable description (present on failure). |


---