Loki duplicates

From UVOO Tech Wiki
Jump to navigation Jump to search

In Loki, deduplication happens both at ingest-time and (optionally) at query-time, but it’s driven by the combination of:

  1. Stream identity (tenant + label set, e.g. {job="myjob"}),
  2. Timestamp (the epoch you push), and
  3. Log line content

1. Ingest-time dropping of exact duplicates When you push two identical log lines (same timestamp, same labels, same exact text) into the same stream, the second one is silently dropped. Loki’s distributor dedupes logs that would collide on “timestamp + stream”; only the first one survives (community.grafana.com, community.grafana.com).


2. increment_duplicate_timestamp for near-duplicates If you enable in your loki.yaml under limits_config:

limits_config:
  increment_duplicate_timestamp: true

then on ingest, whenever a new line arrives with the same timestamp but different content, Loki will nudge the timestamp forward by one nanosecond to preserve order. This only applies when the log text differs—truly identical lines still collide and get deduped as above (community.grafana.com, community.grafana.com).


3. Query-time deduplication in HA setups In a multi-replica deployment (i.e. you’ve set a replication factor > 1), each ingester may hold its own copy of a stream. When you run a query, the Querier automatically deduplicates across ingesters so you don’t see each replica’s copy twice (community.grafana.com, saikiranpikili.medium.com).


So:

  • Two identical messages (same epoch, same job label, same text) → only one stored.
  • Two messages with same epoch and job but different text → both can be stored (and ordering preserved if you’ve enabled increment_duplicate_timestamp).
  • Multiple replicas → query-time dedupe prevents you from seeing the same event again.