← All posts

Why your OTel Collector config is probably wrong

Five misconfigurations we see in almost every brownfield OTel Collector deployment, and how to catch them before they cost you.

You’ve deployed the OpenTelemetry Collector. The YAML compiles. Data flows. Dashboards light up. Everything looks fine. Then your next invoice arrives, or an incident slips through because a critical service was silently dropping spans.

Here are five misconfigurations we encounter in nearly every brownfield Collector deployment.

1. No memory limiter processor

This is the single most common omission. Without memory_limiter, your Collector will consume unbounded memory under load spikes and eventually get OOM-killed.

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
    spike_limit_mib: 128

The memory_limiter processor should be the first processor in every pipeline. Not the last. Not somewhere in the middle. First. The ordering matters because processors execute in the order they’re listed, and you want to apply backpressure before any other processing consumes memory.

Use OTelBin to visualise your pipeline and verify the ordering.

2. Batch processor with default settings

The default batch processor configuration ships with a timeout of 200ms and a send_batch_size of 8192. For most production workloads, these defaults are too aggressive on the timeout and too generous on the batch size.

processors:
  batch:
    timeout: 5s
    send_batch_size: 1024
    send_batch_max_size: 2048

The key insight: a longer timeout reduces the number of export calls to your backend, which matters a lot when you’re paying per-request or dealing with rate limits. A smaller batch size keeps memory predictable.

Profile your actual throughput before tuning. The right values depend on your data volume, not on blog posts.

3. Receivers listening on 0.0.0.0

The default OTLP receiver binds to 0.0.0.0:4317 (gRPC) and 0.0.0.0:4318 (HTTP). In a Kubernetes pod, this is usually fine. On a VM or bare-metal host, you’ve just exposed your telemetry ingestion endpoint to the network.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "127.0.0.1:4317"
      http:
        endpoint: "127.0.0.1:4318"

Bind to localhost if your instrumented applications are on the same host. If you need cross-host collection, use mTLS (the Collector supports it natively).

4. No retry or queue configuration on exporters

Exporters default to synchronous sends with no retry. When your backend has a brief outage or rate-limits you, telemetry is silently dropped.

exporters:
  otlphttp:
    endpoint: "https://your-backend.example.com"
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s
    sending_queue:
      enabled: true
      num_consumers: 10
      queue_size: 5000

The sending_queue buffers data in memory (or on disk, with the file_storage extension) so transient failures don’t result in data loss. Pair this with the memory_limiter to keep the queue from eating all available RAM.

5. Pipelines with mismatched signal types

This one is subtle. You define a traces pipeline that includes a processor designed for metrics, or you route logs through a traces exporter. The Collector won’t always error on this; it may silently drop data or produce garbage.

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlphttp]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [prometheusremotewrite]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlphttp]

Keep each signal type in its own pipeline. If you need cross-signal processing (e.g., generating metrics from spans), use the spanmetrics connector; don’t try to route spans through a metrics processor.

Validate before you ship

Two tools that will catch most of these issues before they hit production:

  • OTelBin: paste your config, see your pipelines as swimlanes, get validation errors highlighted inline.
  • Telflo: build your pipeline visually if you’re starting from scratch.

And if you want to go deeper on telemetry quality, OllyGarden’s Instrumentation Score can tell you whether the data flowing through your Collector is actually useful.


Have a misconfiguration horror story? We’d love to hear it. Get in touch.