Loki vs postgres hypertable

From UVOO Tech Wiki
Revision as of 16:47, 7 July 2025 by Busk (talk | contribs) (Created page with "Here’s a rough comparison of what you might see when querying a **1-month, \~100 GB** time-series dataset on commodity hardware (e.g. SSDs, 16 cores). Actual numbers will va...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Here’s a rough comparison of what you might see when querying a 1-month, \~100 GB time-series dataset on commodity hardware (e.g. SSDs, 16 cores). Actual numbers will vary based on data cardinality, step size, compression, and parallelism.


PostgreSQL Hypertable (TimescaleDB)

  • Performance: TimescaleDB benchmarks report up to 350×–1 000× faster queries than vanilla PostgreSQL, thanks to time-partitioned “chunks,” adaptive chunk-skipping, and (optionally) continuous aggregates. (timescale.com, tigerdata.com)
  • Latency: A full 1-month range-aggregate over \~100 GB raw (\~10 GB compressed) typically completes in 1–3 s. Pre-aggregated (continuous) views can serve sub-100 ms results.

Prometheus TSDB

  • Architecture: Data is stored in 2 h blocks with an index of series IDs → chunks, and queries are executed via PromQL over these blocks.
  • Latency: A 1-month range query at 1 min resolution (\~43 200 points) spans \~360 blocks. On a single Prometheus instance, you’re likely to see 5–30 s per query, depending on series cardinality and step. In a highly parallel Grafana Mimir (Prometheus long-term) setup (1 b active series), 99.9 % of reads complete in < 2 s. (grafana.com)

Grafana Loki (TSDB index)

  • Indexing model: Only metadata (labels) are indexed; raw log chunks are kept in object storage. Queries are sharded at the TSDB level.
  • Throughput: Loki TSDB aims for 300–600 MB/s per query shard. ([grafana.com][4])
  • Latency (200 GB bench → \~100 GB scale):
    • Label-only filter (e.g. {region="us-east-2"}): \~\~1 s
    • Full-text/regex filter (e.g. |="queen"): \~\~4 s
    • Full-scan aggregation (e.g. count by log level): \~\~40 s (Quickwit vs. Loki benchmark: 212 GB logs → 90 s for full-scan; scaling to 100 GB ≈ 40 s) (quickwit.io)

Technology Query Type Approx. Latency (100 GB)
TimescaleDB Range-aggregate (continuous) < 0.1 s
Range-aggregate (raw scan) 1–3 s
Prometheus TSDB 1 min-step range query 5–30 s
Grafana Loki Label filter \~1 s
Full-text/regex filter \~4 s
Full-scan aggregation (count by level) \~40 s

Caveats & tuning:

  • TimescaleDB: chunk interval, continuous aggregates, and compression settings can push latencies into the sub-second or millisecond range.
  • Prometheus: increasing --query.max-concurrency or using remote-read backends (e.g. Mimir) and proper step sizes will improve throughput.
  • Loki: tuning tsdb_max_query_parallelism, chunk_encoding: snappy, and using SSD-backed object storage can drastically cut query times.

This should give you a ballpark—PostgreSQL hypertables (with TimescaleDB) excel at fast aggregations, Prometheus TSDB is optimized for metrics with moderate latency, and Loki’s log-centric model trades off higher latencies for low-index-overhead at ingest time.

[4]: https://grafana.com/docs/loki/latest/operations/storage/tsdb/ "Single Store TSDB (tsdb) | Grafana Loki documentation "