Observability

Klaxon's telemetry is unified under OpenTelemetry. Every component (server, auth, worker, web, mobile) emits traces + metrics + logs through OTLP into an in-cluster OTel Collector, which forwards to OneUptime. There's exactly one OneUptime credential in the system — the Collector holds it.

Architecture

┌─ browser (@klaxon/web) ──────────────────┐
│  @opentelemetry/sdk-trace-web            │
│  @opentelemetry/sdk-logs                 │   OTLP/HTTP + traceparent header
│  instrumentation-fetch/document-load     ├───────────────┐
└──────────────────────────────────────────┘               │
┌─ mobile (klaxon-mobile / Expo) ──────────┐               │
│  @opentelemetry/sdk-trace-base           │   OTLP/HTTP   │
│  manual fetch wrapper + screen spans     ├───────────────┤
└──────────────────────────────────────────┘               ▼
                                            ┌─────────────────────────────┐
┌─ klaxon-server ──────────────────────────┐│  OTel Collector (Deployment)│
│  klaxon-telemetry crate                  ││  OTLP gRPC :4317  ◄────────┤ Rust
│  traces + logs via OTLP/gRPC             ├┤  OTLP HTTP :4318  ◄────────┤ browser/mobile
└──────────────────────────────────────────┘│  prometheus receiver ──────┤ /metrics scrape
┌─ klaxon-auth ────────────────────────────┐│  processors: batch,        │
│  same klaxon-telemetry init              ├┤  memory_limiter, resource  │
└──────────────────────────────────────────┘│                            │
┌─ klaxon-server --worker ─────────────────┐│  exporter: otlphttp        │
│  + worker-specific metrics               ├┤  → OneUptime               │
└──────────────────────────────────────────┘└──────────┬─────────────────┘
                                                       ▼
                                              OneUptime OTLP ingestor

Signal paths:

Traces — each binary/app pushes OTLP to the Collector. Browser + mobile inject W3C traceparent on every fetch, so a single trace spans browser → klaxon-auth → klaxon-server when a user clicks through the UI.
Metrics — HTTP + worker metrics stay on the existing metrics crate + Prometheus /metrics endpoint. The Collector's prometheus receiver scrapes those and re-exports as OTLP, side-stepping the Rust metrics ↔ OTEL SDK bridge.
Logs — Rust uses opentelemetry-appender-tracing so every tracing::info! / warn! / error! event becomes an OTLP log record stamped with the active trace_id and span_id. Browser + mobile do the same via @opentelemetry/api-logs.

Configuration

Rust (klaxon-server + klaxon-auth + worker)

Variable	Default	Description
`OTEL_ENABLED`	`false`	When false, only JSON stdout logging runs.
`OTEL_EXPORTER_OTLP_ENDPOINT`	`http://localhost:4317`	Collector gRPC endpoint.
`OTEL_SERVICE_NAME`	`klaxon-server`	`service.name` resource attribute.
`DEPLOYMENT_ENVIRONMENT`	`development`	`deployment.environment` resource attribute.
`K8S_POD_NAME`, `K8S_NAMESPACE_NAME`	unset	Populated from the k8s downward API when deployed via Helm.

The Helm chart wires all of these automatically from values.yaml::otelCollector.enabled — operators don't set the env vars individually.

Web (`@klaxon/web`)

Vite env vars, baked at build time:

VITE_OTEL_ENDPOINT — OTLP/HTTP base URL. Defaults to /otel, i.e. same origin as the web UI (ingress routes /otel/* to the Collector).
VITE_APP_VERSION — populated from your CI build ID, becomes service.version.

Local dev: the Vite config proxies /otel to localhost:4318 so pnpm --filter @klaxon/web dev works against docker compose up -d without CORS gymnastics.

Mobile (`klaxon-mobile`, Expo)

Expo EXPO_PUBLIC_* env vars, baked at EAS build time:

EXPO_PUBLIC_OTEL_ENDPOINT — OTLP/HTTP base URL, e.g. https://klaxon.sh/otel. Defaults to http://localhost:4318 for expo start.
EXPO_PUBLIC_ENVIRONMENT — becomes deployment.environment.

OneUptime setup

Create (or open) a OneUptime project and navigate to Settings → Telemetry.
Copy the OTLP ingestion URL (typically https://oneuptime.com/otlp for OneUptime Cloud) and the project token.

Populate the Helm values:

yaml

secrets:
  oneuptimeOtlpEndpoint: "https://oneuptime.com/otlp"
  oneuptimeOtlpToken: "<your project token>"

Or via Pulumi:

bash

pulumi config set klaxon:oneuptimeOtlpEndpoint https://oneuptime.com/otlp
pulumi config set --secret klaxon:oneuptimeOtlpToken <your token>

helm upgrade --install klaxon deploy/helm/klaxon -f values.yaml.
Verify in OneUptime that klaxon-server, klaxon-auth, klaxon-worker, klaxon-web, klaxon-mobile all appear under Services, and that metrics with names like http_requests_total and klaxon_worker_task_duration_seconds show up.

Auth format

OneUptime authenticates OTLP/HTTP with an HTTP Authorization: Basic <token> header. The Collector config wraps ${env:ONEUPTIME_OTLP_TOKEN} in that header — operators just supply the raw token. If your OneUptime self-hosted install uses a different scheme (e.g. a custom header), edit deploy/helm/klaxon/templates/otel-collector-configmap.yamlexporters.otlphttp/oneuptime.headers directly.

Local development

bash

docker compose up -d   # starts postgres + redis + otel-collector

The local Collector is otel/opentelemetry-collector-contrib with a debug exporter (see deploy/otel-collector-local.yaml), so everything sent to it is pretty-printed to the Collector's stdout:

bash

docker compose logs -f otel-collector

Run the Rust binaries against it:

bash

OTEL_ENABLED=true \
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 \
DEPLOYMENT_ENVIRONMENT=development \
cargo run -p klaxon-server

Run the web dev server:

bash

pnpm --filter @klaxon/web dev
# opens http://localhost:5173 — the Vite proxy forwards /otel → 4318

Run Expo on a physical device (LAN):

bash

cd apps/klaxon-mobile
EXPO_PUBLIC_OTEL_ENDPOINT=http://<your-laptop-lan-ip>:4318 \
  npx expo start

What's instrumented

Server (`klaxon-server`)

Every HTTP request gets a root span via track_metrics middleware (crates/klaxon-server/src/metrics.rs). Attributes: method, path, request_id, status, duration_ms. The middleware also populates http_requests_total + http_request_duration_seconds Prometheus metrics.
#[tracing::instrument] on all handler functions — every DB transaction, every MCP tool call, every WebSocket lifecycle event gets a child span.
Two business metrics: klaxon_mcp_tool_calls_total{tool,success}, klaxon_active_websockets.

Auth (`klaxon-auth`)

Uses the shared klaxon-telemetry crate, so distributed traces from the browser (via the traceparent header) stitch together across the OAuth consent flow and subsequent /api/* calls.
No custom metrics yet.

Worker (`klaxon-server --worker`)

Each sweep (snooze_sweep, auto_archive_sweep, push_batch, webhook_batch, email_notifications, session_cleanup, audit_retention) gets a #[tracing::instrument] span and updates two metrics:
- klaxon_worker_task_duration_seconds{task} histogram
- klaxon_worker_task_failures_total{task} counter
klaxon_notification_queue_depth gauge sampled every 15 s from notification_queue.

Web (`@klaxon/web`)

Auto-instrumented via @opentelemetry/instrumentation-fetch + -xml-http-request + -document-load.
Every apiFetch call also gets a manual wrapping span with http.method + http.url + http.status_code so queries / filters can run against business-level attributes without needing to know the auto-instrumentation's span names.
log.* calls in packages/common/src/logger.ts emit OTLP log records + mirror to console.

Mobile (`klaxon-mobile`)

Manual fetch wrapper (apps/klaxon-mobile/lib/api.ts) injects traceparent via propagation.inject.
screen.load span per route change, driven by usePathname in _layout.tsx.
Same log.* shape as the web app.

Request ID

Every server response includes an x-request-id header; if the client sends one, it's propagated, otherwise a UUID v4 is generated. Use this to correlate JSON log entries (request_id field), OTel trace spans (request_id attribute), and client-side records.

Now that trace_id is present in every log record, request_id serves mostly for operators grepping kubectl logs — OneUptime's UI jumps from span to logs via trace_id directly.

Scrape config (Prometheus, if you have one locally)

yaml

scrape_configs:
  - job_name: klaxon
    static_configs:
      - targets: ["localhost:3000", "localhost:3001"]
    metrics_path: /metrics
    scrape_interval: 15s

In production the OTel Collector does this scrape automatically via the prometheus receiver in deploy/helm/klaxon/templates/otel-collector-configmap.yaml.

Debug tips

No spans reaching OneUptime? Check kubectl logs <pod>-otel-collector. If you see failed to push data to exporter then the ONEUPTIME_OTLP_TOKEN is wrong or the endpoint is unreachable.
trace_id missing from log records? The layer ordering in klaxon-telemetry::init is load-bearing (OpenTelemetryLayer before the appender). Regressing this will let logs pass through without span context. Covered by a unit test in the crate.
Browser OTLP being CORS-blocked? The Collector's otlp.http.cors.allowed_origins must include your web origin — edit values.yaml::otelCollector.cors.allowedOrigins.
Eyeballing incoming signals in cluster? Flip values.yaml::otelCollector.debugExporter.enabled = true and redeploy; the Collector will then log pretty-printed spans/logs alongside forwarding them to OneUptime. Disable before production (it logs full payloads).
RN app not producing spans? Make sure EXPO_PUBLIC_OTEL_ENDPOINT is reachable from the device (not just the Metro dev server) — physical devices need your laptop's LAN IP, not localhost.

Observability ​

Architecture ​

Configuration ​

Rust (klaxon-server + klaxon-auth + worker) ​

Web (@klaxon/web) ​

Mobile (klaxon-mobile, Expo) ​

OneUptime setup ​

Auth format ​

Local development ​

What's instrumented ​

Server (klaxon-server) ​

Auth (klaxon-auth) ​

Worker (klaxon-server --worker) ​

Web (@klaxon/web) ​

Mobile (klaxon-mobile) ​

Request ID ​

Scrape config (Prometheus, if you have one locally) ​

Debug tips ​

Observability

Architecture

Configuration

Rust (klaxon-server + klaxon-auth + worker)

Web (`@klaxon/web`)

Mobile (`klaxon-mobile`, Expo)

OneUptime setup

Auth format

Local development

What's instrumented

Server (`klaxon-server`)

Auth (`klaxon-auth`)

Worker (`klaxon-server --worker`)

Web (`@klaxon/web`)

Mobile (`klaxon-mobile`)

Request ID

Scrape config (Prometheus, if you have one locally)

Debug tips