Home / Articles / Node.js Observability with OpenTelemetry

Node.js Observability with OpenTelemetry

Backend
Anant

Written by: Anant

Software Engineer | Systems & Web

Observability gives you visibility into what your system is doing in production. With OpenTelemetry (OTel), you can standardize traces, metrics, and logs across services.

Core Signals

  • Traces: request flow across services
  • Metrics: time-series measurements (latency, throughput, error rate)
  • Logs: event details and context

Why OpenTelemetry?

  • Vendor-neutral instrumentation
  • Unified semantic conventions
  • Works with most backends (Jaeger, Tempo, Datadog, New Relic, etc.)

Basic Node.js Setup

ts
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_TRACES_ENDPOINT,
}),
instrumentations: [getNodeAutoInstrumentations()],
})
sdk.start()

Instrument HTTP + Database

Auto-instrumentation can capture:

  • inbound HTTP requests
  • outbound HTTP/fetch calls
  • PostgreSQL/MySQL/Redis operations

This helps identify where latency is spent (network, DB, external API, etc.).

Add Manual Spans for Business Logic

ts
const span = tracer.startSpan('checkout.calculateTotals')
try {
const result = await calculateTotals(cart)
span.setAttribute('cart.items', cart.items.length)
return result
} finally {
span.end()
}

Manual spans make traces meaningful beyond framework-level events.

Propagation Across Services

Ensure traceparent headers are forwarded so traces remain connected end-to-end. Without propagation, each service appears as isolated spans.

Production Tips

  • Sample intelligently (e.g., 10% baseline, 100% on errors)
  • Tag spans with tenant/user-safe identifiers
  • Redact PII in attributes/logs
  • Set SLO-based alerts from metrics

Golden Dashboard

Start with a dashboard including:

  • P50/P95/P99 latency
  • error rate by route
  • top slow spans
  • upstream dependency health

Final Takeaway

Observability is not just toolingit is feedback for architecture decisions. OpenTelemetry gives a practical baseline to understand performance, reliability, and failure behavior in Node.js systems.

Anant

Bridging the gap between high-level applications and low-level systems. Crafting resilient software with a focus on performance and observability.

Expertise

  • Systems Engineering
  • Full Stack Development
  • Cloud Infrastructure
  • Digital Signal Processing
  • Embedded Systems

Stay Connected

Open to opportunities and interesting conversations.

Get in Touch

© 2026 Anant. All rights reserved.

Systems Operational