Oct 20, 20182 min read
Node.js Observability with OpenTelemetry
Node.jsObservability
Node.js Observability with OpenTelemetry
Observability gives you visibility into what your system is doing in production. With OpenTelemetry (OTel), you can standardize traces, metrics, and logs across services.
Core Signals
- Traces: request flow across services
- Metrics: time-series measurements (latency, throughput, error rate)
- Logs: event details and context
Why OpenTelemetry?
- Vendor-neutral instrumentation
- Unified semantic conventions
- Works with most backends (Jaeger, Tempo, Datadog, New Relic, etc.)
Basic Node.js Setup
tsimport { NodeSDK } from '@opentelemetry/sdk-node' import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node' import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http' const sdk = new NodeSDK({ traceExporter: new OTLPTraceExporter({ url: process.env.OTEL_EXPORTER_OTLP_TRACES_ENDPOINT, }), instrumentations: [getNodeAutoInstrumentations()], }) sdk.start()
Instrument HTTP + Database
Auto-instrumentation can capture:
- inbound HTTP requests
- outbound HTTP/fetch calls
- PostgreSQL/MySQL/Redis operations
This helps identify where latency is spent (network, DB, external API, etc.).
Add Manual Spans for Business Logic
tsconst span = tracer.startSpan('checkout.calculateTotals') try { const result = await calculateTotals(cart) span.setAttribute('cart.items', cart.items.length) return result } finally { span.end() }
Manual spans make traces meaningful beyond framework-level events.
Propagation Across Services
Ensure
terminal
traceparentProduction Tips
- Sample intelligently (e.g., 10% baseline, 100% on errors)
- Tag spans with tenant/user-safe identifiers
- Redact PII in attributes/logs
- Set SLO-based alerts from metrics
Golden Dashboard
Start with a dashboard including:
- P50/P95/P99 latency
- error rate by route
- top slow spans
- upstream dependency health
Final Takeaway
Observability is not just tooling—it is feedback for architecture decisions. OpenTelemetry gives a practical baseline to understand performance, reliability, and failure behavior in Node.js systems.
Written by Anant Kumar
Systems Engineer & Full Stack Developer