OpenTelemetry in Rust: From Zero to Production Observability

Why Observability Matters

You ship a service. It runs. Users start complaining that things are slow, or that something fails randomly, but only in production. You have no idea what is happening inside the process at runtime.

This is the observability problem. You need a window into your running system one that tells you what happened (logs), how your system performed over time (metrics), and why a specific request was slow or broken (traces).

OpenTelemetry is the industry-standard toolkit for collecting all three signals in a vendor-neutral way, so you can send them to Jaeger, Grafana, Datadog, or any OTLP-compatible backend without rewriting your instrumentation.

Part 1: The Three Signals

Before writing a single line of code, you need to understand what you are collecting and why.

Logs

A log is a timestamped, human-readable record of something that happened.

2026-03-12T10:00:01Z INFO  user created user_id=42
2026-03-12T10:00:02Z ERROR database connection failed err="connection refused"

Logs are the oldest form of observability. They are cheap to write and easy to understand, but they are hard to search at scale and impossible to correlate across services without extra plumbing.

In Rust, the standard way to write logs is the tracing crate:

use tracing::{info, warn, error};

info!("user created");
warn!(user_id = 42, "quota almost reached");   // structured key-value fields
error!(err = %e, "database connection failed");

The % sigil tells tracing to format the value using its Display implementation. ? uses Debug. Undecorated values must implement tracing::Value.

Metrics

A metric is a numeric measurement sampled over time. It answers questions like:

How many requests per second is the service handling?
What is the p99 latency of the /checkout endpoint?
How many background jobs are in the queue right now?

There are three core metric types you will use:

Type	When to use	Example
Counter	Things you count up; never go down	`http.requests.total`
Gauge	Values that go up and down	`queue.depth`, `memory.used`
Histogram	Distribution of a value (latency, size, duration)	`request.duration_ms`

Unlike logs, metrics are aggregated. You do not keep every individual sample; you keep statistical summaries (counts, sums, bucket counts). This makes them cheap to store and fast to query even at high volume.

Traces (and Spans)

A trace represents the full lifecycle of a single request as it travels through your system across service boundaries, thread boundaries, and async task switches. A trace is made up of spans.

A span is one unit of work within a trace. It has:

A name (e.g. "process_payment")
A start time and end time
A set of attributes (key-value metadata)
A parent span ID (which links it into the trace tree)
A status (OK or ERROR)

When a request enters your service, you open a root span. Every sub-operation opens a child span. When everything completes, the collected tree of spans forms the trace, which you can visualise in a tool like Jaeger as a waterfall diagram.

[  handle_request          100ms  ]
  [  auth_check     5ms  ]
  [  db_query       80ms  ]
    [  query_plan   2ms  ]
    [  row_scan     75ms  ]
  [  serialize      3ms  ]

From this diagram you can immediately see that db_query → row_scan is taking 75 ms and is the bottleneck something a log or metric alone could not show you.

How the Three Signals Complement Each Other

Question	Best signal
Did anything go wrong?	Logs
Is my service degraded overall?	Metrics
Why was this specific request slow?	Traces
What was happening when this span ran?	Logs in span

In a well-instrumented service all three are correlated: a metric spike leads you to a trace, a trace leads you to a log line that tells you exactly what broke. OpenTelemetry is the glue that makes this correlation automatic.

Part 2: OpenTelemetry Architecture

The Spec, the SDKs, and the Collector

OpenTelemetry is three things at once:

A specification defines a standard data model and wire protocol (OTLP) for all three signals.
SDKs language-specific libraries (opentelemetry, opentelemetry_sdk in Rust) that implement the spec in your process.
The Collector an optional but recommended sidecar process that receives OTLP data from your app, processes it (batching, filtering, enrichment), and fans it out to one or more backends.

  Your Rust app
  ┌────────────────────────────────────┐
  │  tracing macros (info!, span!, …)  │
  │           │                        │
  │   tracing-subscriber registry      │
  │   ┌───────┴──────────────────────┐ │
  │   │ EnvFilter                    │ │
  │   │ fmt::layer (JSON → stdout)   │ │
  │   │ tracing-opentelemetry layer  │─┼──→ OTLP gRPC (traces)
  │   │ OTelTracingBridge layer      │─┼──→ OTLP gRPC (logs)
  │   └──────────────────────────────┘ │
  │                                    │
  │   global::meter() API              │─→ OTLP gRPC (metrics)
  └────────────────────────────────────┘
              │ :4317
  ┌───────────▼──────────────┐
  │   OTel Collector         │
  │   receivers: [otlp]      │
  │   processors: [batch]    │
  │   exporters:             │
  │     traces  → Jaeger     │
  │     metrics → Prometheus │
  │     logs    → Loki       │
  └──────────────────────────┘

Why use a Collector at all?

You could point your app directly at Jaeger or Prometheus. The Collector sits in the middle because it:

Decouples your app from your backend. Change backends by editing one YAML file in the Collector; your app does not change.
Buffers and batches. Your app does not block on network I/O to a remote backend.
Enriches telemetry. Add Kubernetes pod labels, cloud region, etc. without touching your code.
Fans out. Send the same trace to Jaeger and your cloud provider simultaneously.

OTLP the wire protocol

OTLP (OpenTelemetry Protocol) is the standard format OpenTelemetry SDKs use to ship data to a Collector or backend. It runs over gRPC (port 4317) or HTTP/JSON (port 4318). In this setup we use gRPC via the tonic transport because it is lower overhead and supports streaming.

Part 3: Setting Up the Rust Project

Crates

Add these to Cargo.toml:

[dependencies]
# Core OTel API (trace + metrics)
opentelemetry = { version = "0.31.0", features = ["trace", "metrics"] }

# OTLP exporter: gRPC transport (tonic), all three signals
opentelemetry-otlp = { version = "0.31.0", features = ["grpc-tonic", "trace", "metrics", "logs"] }

# SDK implementations of the three providers
opentelemetry_sdk = { version = "0.31.0", features = ["rt-tokio", "trace", "metrics"] }

# Bridges tracing log events into OTel log records
opentelemetry-appender-tracing = "0.31.1"

# The tracing subscriber ecosystem
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter", "fmt", "json"] }

# Converts tracing spans into OTel spans
tracing-opentelemetry = "0.32.1"

A few things worth noting:

opentelemetry is the API crate thin, no runtime cost if telemetry is disabled. Your library code should only ever depend on this.
opentelemetry_sdk is the implementation crate the actual providers, exporters, and batch processors. Only your binary / top-level crate should pull this in.
rt-tokio tells the SDK to use Tokio for its async background tasks (the batch export loop, the periodic metric reader). You must enable this if your app uses Tokio.
opentelemetry-appender-tracing and tracing-opentelemetry are the bridge crates that connect the tracing world to the OTel world.

The Config Structure

Before the telemetry initialisation code, you need to know what makes it configurable. We use a TelemetryConfig struct that is loaded from a YAML config file (or environment variables):

pub struct TelemetryConfig {
    pub level: LogLevel,       // fallback log level (overridden by RUST_LOG)
    pub endpoint: String,      // OTLP collector address, e.g. "http://localhost:4317"
    pub tracer_name: String,   // scopes spans from this service in the backend
    pub app_name: String,      // becomes service.name in all telemetry
}

This is dependency injection: the telemetry module does not read env vars or config files directly; it receives a plain struct. This makes it easy to test and easy to change the config source without touching telemetry.rs.

Part 4: The `init_telemetry` Function, Line by Line

All the setup lives in src/utils/telemetry.rs. Let us walk through it section by section.

Step 1: The Resource

fn resource(app_name: &str) -> Resource {
    use opentelemetry::KeyValue;

    Resource::builder()
        .with_attribute(KeyValue::new("service.name", app_name.to_owned()))
        .build()
}

A Resource is metadata that describes the entity producing telemetry "who is sending this data?" Every span, metric data point, and log record is stamped with the resource. This is how your observability backend knows that a span belongs to rustapp and not to some other service.

service.name is a required OTel semantic convention. Without it, Jaeger will still accept your traces but will label them as an unknown service.

We build the resource once per signal rather than sharing a single clone because Resource is immutable and cheap to construct; keeping three separate instances avoids any ownership entanglement.

Step 2: Traces

let span_exporter = SpanExporter::builder()
    .with_tonic()               // gRPC transport via the tonic crate
    .with_endpoint(endpoint)    // "http://localhost:4317"
    .build()?;

let tracer_provider = SdkTracerProvider::builder()
    .with_resource(resource(&cfg.app_name))
    .with_batch_exporter(span_exporter)
    .build();

let tracer = tracer_provider.tracer(cfg.tracer_name.clone());

with_batch_exporter is important. Without it you would call with_simple_exporter, which makes a synchronous gRPC call for every single span an enormous performance problem under any real load. The batch exporter accumulates spans in memory and sends them in bulk, periodically or when the buffer fills.

The tracer handle (scoped by tracer_name) is what we hand to tracing-opentelemetry so it knows which provider to route spans through.

Step 3: Metrics

let metric_exporter = MetricExporter::builder()
    .with_tonic()
    .with_endpoint(endpoint)
    .build()?;

let meter_provider = SdkMeterProvider::builder()
    .with_resource(resource(&cfg.app_name))
    .with_reader(PeriodicReader::builder(metric_exporter).build())
    .build();

opentelemetry::global::set_meter_provider(meter_provider.clone());

PeriodicReader is the metrics equivalent of batch export: it polls all registered instruments at a fixed interval (default 60 seconds) and sends the current values to the exporter. You never push a metric manually; you just record values and the reader handles the rest.

set_meter_provider registers the provider in a global registry. This means any module in your codebase can call opentelemetry::global::meter("name") without needing a direct reference to the provider analogous to how tracing's global subscriber works.

Note we pass meter_provider.clone() to set_meter_provider and keep the original for the Shutdown guard. The provider is reference-counted internally, so the clone is cheap.

Step 4: Logs

let log_exporter = LogExporter::builder()
    .with_tonic()
    .with_endpoint(endpoint)
    .build()?;

let logger_provider = SdkLoggerProvider::builder()
    .with_resource(resource(&cfg.app_name))
    .with_batch_exporter(log_exporter)
    .build();

The log provider is not registered globally because there is no set_logger_provider equivalent in the OTel API. Instead, we pass it directly to the OpenTelemetryTracingBridge layer in the next step. The bridge keeps the reference alive.

Step 5: The tracing-subscriber Registry

This is where everything gets wired together:

let env_filter =
    EnvFilter::try_from_default_env()
        .unwrap_or_else(|_| EnvFilter::new(cfg.level.as_str()));

tracing_subscriber::registry()
    .with(env_filter)
    .with(tracing_subscriber::fmt::layer().json())
    .with(tracing_opentelemetry::layer().with_tracer(tracer))
    .with(
        opentelemetry_appender_tracing::layer::OpenTelemetryTracingBridge::new(
            &logger_provider,
        ),
    )
    .init();

Think of the registry as a pipeline. Every info!() call, every span, passes through each layer in order:

Layer	What it does
`EnvFilter`	Reads `RUST_LOG` at runtime (e.g. `RUST_LOG=debug`). If unset, falls back to `cfg.level`. Events that do not match are dropped here nothing below sees them.
`fmt::layer().json()`	Serialises events to structured JSON and writes them to stdout. Good for production log aggregators (Fluentd, Loki) that parse JSON.
`tracing_opentelemetry::layer()`	Converts `tracing` spans into OTel spans and ships them through the `SdkTracerProvider` we built above.
`OpenTelemetryTracingBridge`	Converts `tracing` log events into OTel log records and ships them through `SdkLoggerProvider`.

The key insight: you never call OTel APIs for logs or traces. You just write info!() and #[instrument] as you normally would in any Rust code. The bridge layers do the forwarding silently.

Step 6: The Shutdown Guard

pub struct Shutdown {
    tracer_provider: SdkTracerProvider,
    meter_provider: SdkMeterProvider,
    logger_provider: SdkLoggerProvider,
}

impl Drop for Shutdown {
    fn drop(&mut self) {
        let _ = self.tracer_provider.shutdown();
        let _ = self.meter_provider.shutdown();
        let _ = self.logger_provider.shutdown();
    }
}

This is the RAII (Resource Acquisition Is Initialisation) pattern. When main returns, the stack unwinds and Shutdown is dropped, which calls each provider's shutdown(). That method flushes any spans, metrics, or log records still sitting in the internal buffers before the process exits.

Without this, the last batch of telemetry is silently lost every time your process exits normally. In practice this means you will miss the final spans from a graceful shutdown exactly when you most want to see what happened.

In main.rs, the guard must be kept alive for the full program lifetime:

let _shutdown = init_telemetry(&config.telemetry)?;
//  ^
//  The leading underscore suppresses "unused variable" warnings, but
//  crucially the binding is NOT dropped immediately (unlike `let _ = ...`
//  which would drop it at the end of the statement).

The distinction matters: let _ = init_telemetry(...) would drop the Shutdown immediately after construction, calling shutdown before your app even starts. let _shutdown = ... keeps it alive until the end of main.

Part 5: Using Telemetry in Application Code

Logs: just use tracing macros

use tracing::{info, warn, error, debug};

info!("application started");
warn!(user_id = 42, "quota almost reached");
error!(err = %e, "payment failed");
debug!(payload = ?body, "received request"); // ? = Debug format

These flow through the subscriber pipeline automatically. The JSON layer writes them to stdout; the bridge ships them to OTLP. You write one macro call and get two destinations.

Traces: instrument whole functions (preferred)

use tracing::instrument;

#[instrument]
async fn process_job(job_id: u64) {
    info!("processing started");
    do_work().await;
    info!("processing complete");
}

#[instrument] creates a span named after the function (process_job) every time it is called. The job_id argument is automatically captured as a span attribute. Any log events emitted inside the function are nested under the span in the trace backend.

You can customise what gets captured:

#[instrument(skip(password), fields(user = %user.id))]
async fn login(user: &User, password: &str) { … }

skip prevents sensitive fields from appearing in the span. fields lets you add custom attributes beyond the function arguments.

Traces: manual spans for finer control

use tracing::{info_span, Instrument};

let span = info_span!("send_email", recipient = %email, template = "welcome");
send_email_inner(payload).instrument(span).await;

Use manual spans when you want to name the span something different from the function name, or when you are adding a span around a block of code that is not a whole function.

Metrics: the one place you call OTel directly

Metrics have no tracing bridge, so you use the OTel API directly. It is still straightforward:

use opentelemetry::global;

// --- Counter (monotonically increasing, e.g. total requests processed) ---
let meter = global::meter("payment-service");
let counter = meter.u64_counter("payments.processed").build();
counter.add(1, &[]);

// --- Histogram (distribution of values, e.g. latency) ---
let histogram = meter.f64_histogram("payment.duration_ms").build();
histogram.record(latency_ms, &[]);

// --- With attributes (dimensions/labels for slicing in dashboards) ---
use opentelemetry::KeyValue;
counter.add(1, &[
    KeyValue::new("payment.method", "card"),
    KeyValue::new("payment.status", "success"),
]);

Attributes are the metric equivalent of log fields they let you slice the data by dimension (e.g. "show me payments.processed grouped by payment.method").

Create the meter and instrument objects once (ideally at service startup or in a lazy_static / OnceLock) and reuse them. Creating a new meter on every request is wasteful; the underlying objects are meant to be long-lived.

Part 6 The Collector and Local Infrastructure

docker-compose.yaml

The local dev stack brings up four services:

services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.123.0
    ports:
      - "4317:4317" # OTLP gRPC  ← your app sends here
      - "4318:4318" # OTLP HTTP
      - "8888:8888" # Collector self-metrics

  jaeger:
    image: jaegertracing/jaeger:2.6.0
    environment:
      COLLECTOR_OTLP_ENABLED: true
    ports:
      - "16686:16686" # Jaeger UI → http://localhost:16686

  postgres: …
  redis: …

Your app sends OTLP to localhost:4317. The Collector receives it, processes it, and forwards traces to Jaeger. Open http://localhost:16686 in your browser to see the trace waterfall.

configs/otel-collector.yaml

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024

exporters:
  otlp/jaeger:
    endpoint: jaeger:4317 # internal Docker network name
    tls:
      insecure: true
  debug:
    verbosity: normal

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp/jaeger, debug]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]

The Collector config is straightforward: receive on gRPC, batch everything, and export traces to Jaeger. In production you would replace debug with a real backend Prometheus remote write for metrics, Loki or Elasticsearch for logs.

Running locally

make up      # starts docker-compose (Collector, Jaeger, Postgres, Redis)
make dev     # cargo run with CONFIG_PATH=configs/config.yaml

Then open http://localhost:16686 → Search → select rustapp → hit Find Traces. Every request your app handles will appear as a trace.

Part 7: Things to Know in Production

Version alignment is critical

The opentelemetry, opentelemetry_sdk, opentelemetry-otlp, opentelemetry-appender-tracing, and tracing-opentelemetry crates must all be on compatible versions. The OTel Rust ecosystem moves fast and semver minor bumps often break the API. Pin all OTel crates to the same minor version and update them together.

The batch processor has knobs

The default batch exporter settings (max queue size 2048 spans, max export batch size 512, scheduled delay 5 seconds, export timeout 30 seconds) are tuned for moderate throughput. Under high load, you may need to adjust them:

use opentelemetry_sdk::trace::BatchConfigBuilder;

let batch_config = BatchConfigBuilder::default()
    .with_max_queue_size(8192)
    .with_max_export_batch_size(1024)
    .build();

SdkTracerProvider::builder()
    .with_batch_exporter(span_exporter)
    // apply batch_config via the builder...
    .build();

If the queue fills up, spans are dropped silently. Monitor the Collector's self-metrics (http://localhost:8888/metrics) to spot export failures.

`RUST_LOG` overrides everything

At runtime, set RUST_LOG=debug to get verbose output. RUST_LOG=warn to silence everything below warnings. You can target specific modules:

RUST_LOG=rustapp=debug,sqlx=warn,tokio=error

The EnvFilter is checked first; if a log event does not match, it never reaches the JSON layer or the OTel bridge. This makes it safe to leave debug!() calls in the code they are zero-cost in production unless you explicitly enable them.

`let _shutdown` vs `let _ = ...`

This deserves repeating because it is a common footgun:

let _shutdown = init_telemetry(&config.telemetry)?;  // CORRECT lives to end of main
let _ = init_telemetry(&config.telemetry)?;          // WRONG dropped immediately

Rust's _ pattern drops the value on the spot. _name creates a binding that lives to the end of its scope. For RAII guards, always use the latter.

Sampling

Collecting every single span in a high-traffic service is expensive. OTel supports trace sampling:

Always-on (default): every trace is collected. Fine for dev; costly in prod.
TraceIdRatioBased: sample a percentage (e.g. 1%) of traces randomly.
ParentBased: respect the sampling decision of the parent span (important for distributed tracing across services).

Configure sampling on the SdkTracerProvider before you go to production.

Quick Reference

Which macro for which job?

info!("message")                   // plain log
info!(key = value, "message")      // structured log with fields
#[instrument]                      // auto-span the whole function
#[instrument(skip(secret))]        // auto-span, hide a field
info_span!("name", key = value)    // manual span
.instrument(span)                  // attach span to a future

Metric types at a glance

meter.u64_counter("name").build()       // count up (requests, jobs)
meter.f64_histogram("name").build()     // distribution (latency, size)
meter.i64_up_down_counter("name").build() // goes up and down (queue depth)
meter.f64_gauge("name").build()         // current value (temperature, ratio)

Signal summary

Signal	How you emit it	Where it goes
Log	`info!()` / `warn!()` / `error!()`	stdout + OTel log backend
Trace	`#[instrument]` or `info_span!()`	OTel trace backend (Jaeger)
Metric	`global::meter().u64_counter()` etc.	OTel metric backend

Part 8: Complete Usage Examples (Import to Run)

This section gives you copy-paste-ready, realistic code for every signal. Each example shows the full import block so you do not have to hunt for crate names.

Logs: complete example

Logs are the simplest signal. No setup beyond what init_telemetry already does. Just import and call.

// src/services/user_service.rs
use tracing::{debug, error, info, instrument, warn};
use uuid::Uuid;

pub struct UserService { /* … db pool, etc. */ }

impl UserService {
    /// Creates a new user.
    /// The #[instrument] macro wraps the whole function in a span named
    /// "create_user" and records `email` as a span attribute automatically.
    /// Every log inside is nested under that span in Jaeger.
    #[instrument(skip(self), fields(service = "user"))]
    pub async fn create_user(&self, email: &str) -> Result<Uuid, anyhow::Error> {
        // Plain info log shows up in stdout JSON and in the OTel log backend.
        info!(email, "creating user");

        // Structured fields come before the message string.
        // `%` means Display format, `?` means Debug format.
        debug!(email, attempt = 1, "querying db");

        let user_id = Uuid::now_v7();

        match self.db_insert(user_id, email).await {
            Ok(_) => {
                // Attach key-value context to the log line.
                info!(user_id = %user_id, email, "user created successfully");
                Ok(user_id)
            }
            Err(e) => {
                // `%e` formats the error with its Display impl.
                error!(err = %e, email, "failed to insert user");
                Err(e)
            }
        }
    }

    async fn db_insert(&self, _id: Uuid, _email: &str) -> Result<(), anyhow::Error> {
        Ok(()) // placeholder
    }
}

What you get:

stdout: a JSON log line per info! / error! call
Jaeger: the create_user span with all fields as attributes, and the log events attached to the span as span events

Traces: complete example

Option A: `#[instrument]` (use this by default)

// src/services/payment_service.rs
use tracing::{error, info, instrument, warn};

pub struct PaymentService;

impl PaymentService {
    /// `skip(self, card_number)`: hides sensitive fields from the span.
    /// `fields(…)`: adds custom attributes that are not function arguments.
    #[instrument(
        skip(self, card_number),
        fields(
            payment.method = "card",
            payment.currency = %currency,
        )
    )]
    pub async fn charge(
        &self,
        user_id: u64,
        amount_cents: u64,
        currency: &str,
        card_number: &str,   // skipped never appears in telemetry
    ) -> Result<String, anyhow::Error> {
        info!(user_id, amount_cents, "initiating charge");

        // Simulate calling a downstream payment provider.
        let result = self.call_provider(amount_cents).await;

        match result {
            Ok(ref txn_id) => {
                // Attach the transaction ID to the current span's fields.
                tracing::Span::current().record("payment.txn_id", txn_id.as_str());
                info!(txn_id, "charge successful");
                result
            }
            Err(ref e) => {
                warn!(err = %e, user_id, "charge failed");
                result
            }
        }
    }

    async fn call_provider(&self, _amount: u64) -> Result<String, anyhow::Error> {
        Ok("txn_abc123".to_string()) // placeholder
    }
}

Option B: manual spans for fine-grained control

Use this when you need a span around a specific block of code (not a whole function), or when you want to name the span differently from the function.

// src/jobs/email_job.rs
use tracing::{info, info_span, Instrument};

pub async fn send_welcome_emails(user_ids: Vec<u64>) {
    for user_id in user_ids {
        // Build a span with custom attributes, then attach it to the future.
        let span = info_span!(
            "send_welcome_email",
            user.id = user_id,
            email.template = "welcome_v2",
        );

        async move {
            info!(user_id, "sending email");
            // … actual send logic …
            info!(user_id, "email sent");
        }
        .instrument(span)
        .await;
    }
}

Option C: error status on a span

When an operation fails, mark the span as an error so Jaeger highlights it:

// src/handlers/checkout.rs
use opentelemetry::trace::Status;
use tracing::{error, instrument};
use tracing_opentelemetry::OpenTelemetrySpanExt; // needed for set_status

#[instrument]
pub async fn checkout_handler(order_id: u64) -> Result<(), anyhow::Error> {
    let result = process_order(order_id).await;

    if let Err(ref e) = result {
        // Mark the current span as ERROR so Jaeger colors it red.
        tracing::Span::current()
            .set_status(Status::error(e.to_string()));
        error!(err = %e, order_id, "checkout failed");
    }

    result
}

async fn process_order(_id: u64) -> Result<(), anyhow::Error> {
    Ok(())
}

Metrics: complete example

Metrics are the one place you call the OTel API directly. Create your instruments once at startup and reuse them throughout the service lifetime. The idiomatic pattern in Rust is OnceLock or a struct that holds the meters.

Recommended pattern instrument struct

// src/metrics.rs
use opentelemetry::{
    global,
    metrics::{Counter, Histogram, Meter, UpDownCounter},
    KeyValue,
};
use std::sync::OnceLock;

/// All metrics for the HTTP layer.
pub struct HttpMetrics {
    /// Total number of HTTP requests received.
    pub requests_total: Counter<u64>,

    /// Number of requests currently in-flight.
    pub requests_in_flight: UpDownCounter<i64>,

    /// Request duration in milliseconds (histogram = p50/p95/p99).
    pub request_duration_ms: Histogram<f64>,

    /// Response body size in bytes.
    pub response_bytes: Histogram<u64>,
}

impl HttpMetrics {
    pub fn new() -> Self {
        // "http-server" is the instrumentation scope visible in your backend
        // to distinguish metrics from different components.
        let meter: Meter = global::meter("http-server");

        Self {
            requests_total: meter
                .u64_counter("http.requests.total")
                .with_description("Total HTTP requests received")
                .build(),

            requests_in_flight: meter
                .i64_up_down_counter("http.requests.in_flight")
                .with_description("Requests currently being processed")
                .build(),

            request_duration_ms: meter
                .f64_histogram("http.request.duration_ms")
                .with_description("Request latency in milliseconds")
                // Optional: explicit histogram bucket boundaries.
                // Default buckets work for most cases.
                .build(),

            response_bytes: meter
                .u64_histogram("http.response.size_bytes")
                .with_description("Response body size in bytes")
                .build(),
        }
    }
}

// Global singleton initialised once after `init_telemetry()`.
static HTTP_METRICS: OnceLock<HttpMetrics> = OnceLock::new();

pub fn http_metrics() -> &'static HttpMetrics {
    HTTP_METRICS.get_or_init(HttpMetrics::new)
}

Using the metrics in a handler

// src/handlers/orders.rs
use std::time::Instant;
use opentelemetry::KeyValue;
use crate::metrics::http_metrics;

pub async fn list_orders_handler(/* … */) {
    let m = http_metrics();

    // Dimensions (attributes) let you slice metrics in dashboards.
    // e.g. "show me requests grouped by method and route"
    let attrs = &[
        KeyValue::new("http.method", "GET"),
        KeyValue::new("http.route", "/orders"),
    ];

    // Track in-flight requests (goes up when request starts, down when done).
    m.requests_in_flight.add(1, attrs);
    let start = Instant::now();

    let result = fetch_orders_from_db().await;

    let elapsed_ms = start.elapsed().as_secs_f64() * 1000.0;

    // Record latency in the histogram this feeds p50/p95/p99 in Grafana.
    m.request_duration_ms.record(elapsed_ms, attrs);

    // Increment total counter with a status dimension.
    let status = if result.is_ok() { "200" } else { "500" };
    m.requests_total.add(1, &[
        KeyValue::new("http.method", "GET"),
        KeyValue::new("http.route", "/orders"),
        KeyValue::new("http.status_code", status),
    ]);

    m.requests_in_flight.add(-1, attrs);
}

async fn fetch_orders_from_db() -> Result<Vec<()>, anyhow::Error> {
    Ok(vec![])
}

Counter vs Histogram vs UpDownCounter vs Gauge when to use each

Instrument	Rust type	When to use
`u64_counter`	`Counter<u64>`	Things that only go up: requests, jobs, errors
`f64_counter`	`Counter<f64>`	Same, but fractional: bytes transferred
`i64_up_down_counter`	`UpDownCounter<i64>`	Things that go up and down: queue depth, active sessions
`f64_histogram`	`Histogram<f64>`	Distribution: latency, request size, memory allocated
`u64_histogram`	`Histogram<u64>`	Same but integer: file sizes, row counts
`f64_gauge`	`Gauge<f64>`	Current snapshot value: CPU%, memory%, temperature

Counter: never decreases. If you need "requests per second" in Grafana, you derive it from a counter with rate(http_requests_total[1m]). Do not use a gauge for things that are logically monotonic.

Histogram: records individual observations and lets the backend compute percentiles (p50, p95, p99). Always use a histogram for latency measurements, never a gauge showing "current request time".

UpDownCounter: like a counter but can go negative. Use for in-flight requests, queue depth, connection pool usage.

Gauge: a snapshot of the current value at scrape time. Use for things that are already a ratio or absolute current reading: memory utilisation, CPU percentage, cache hit ratio.

Putting it all together in `main.rs`

// src/main.rs
mod handlers;
mod metrics;
mod services;
mod utils;

use std::env;

use anyhow::Result;
use tracing::info;

use crate::utils::config::load_config;
use crate::utils::telemetry::init_telemetry;

#[tokio::main]
async fn main() -> Result<()> {
    let config_path =
        env::var("CONFIG_PATH").unwrap_or_else(|_| "configs/config.yaml".to_string());

    let config = load_config(&config_path)?;

    // IMPORTANT: `_shutdown` must stay alive until the end of main.
    // When it is dropped, all three providers flush their buffers.
    // `let _ = ...` would drop it immediately do NOT do that.
    let _shutdown = init_telemetry(&config.telemetry)?;

    // Warm up the metrics instruments now that the global meter is registered.
    // This ensures the OnceLock is populated before any request comes in.
    let _ = metrics::http_metrics();

    info!(config_path, "application started");

    // … start your axum server, spawn background tasks, etc. …

    Ok(())
    // _shutdown is dropped here → providers flush → process exits cleanly
}

Closing Thoughts

The setup described here is about 170 lines of code most of it boilerplate that you write once and never touch again. What you get in return is:

Structured JSON logs on stdout, parseable by any log aggregator.
Distributed traces in Jaeger that show exactly what each request did and how long each step took.
Metrics available for any dashboard or alerting system.
One endpoint to change if you swap backends.
No OTel API calls in business logic just normal tracing macros.

The tracing crate was always the right way to instrument Rust code. OTel extends it from local stdout logs to a production-grade observability pipeline without changing how you write application code. That is the design win worth understanding.