Back to Blog
clickhouse performance engineering database

2.71 GiB → 166 MiB: How One Column Was Eating 94% of Our Storage

170 million rows. 2.71 GiB on disk. We dropped one column and changed some types. Same data, same queries — 166 MiB. Here's what we learned about ClickHouse compression the hard way.

QCK Engineering Team
February 20, 2026
8 min read

2.71 GiB → 166 MiB: How One Column Was Eating 94% of Our Storage

We load-tested our analytics pipeline with 170 million events. Three months of simulated click data. The numbers came back:

Compressed (on disk):  2.71 GiB
Uncompressed:          16.69 GiB
Compression ratio:     6.16x
Per-row (on disk):     16.8 bytes

A 6x compression ratio. Decent—but ClickHouse is built for columnar compression. Well-tuned analytics tables commonly hit 15-30x. We knew we could do a lot better.

So we looked at per-column storage. And one column made us stare at the screen for a while.

94% of Our Disk Was One Column

Every row had a UUID. A random, unique identifier generated for every event. Standard practice—every table needs a primary key, right?

We checked per-column disk usage:

The UUID column:   2.55 GiB  →  94% of total disk
Everything else:   ~160 MiB  →  6%

Read that again. One column consumed 94% of our storage. Every other column in the table—all of them combined—compressed down to roughly 160 MiB. The UUID was 16x larger than everything else put together.

Why? Because UUIDs are random. Every value is unique. Every byte is entropy. Compression algorithms work by finding patterns, and random data has none. LZ4 looked at 170 million unique 16-byte values and essentially gave up.

170,000,000 × 16 bytes = 2.55 GiB of incompressible noise

Meanwhile, the columns we actually queried—the ones powering our dashboards and aggregations—were compressing beautifully. Some were achieving ratios we didn't think were possible. Hundreds-to-one. The compressor was doing incredible work on everything except the random UUID.

And the UUID? We never queried it. Not once. Not in any dashboard. Not in any API endpoint. Not in any aggregation. We had other columns handling ordering and lookups. The UUID was a habit from relational database thinking, carried over into a columnar world where it became our single biggest cost.

We Dropped It

One migration. One DROP COLUMN. The results were immediate.

The Type Overhaul

With the UUID gone, we looked at what was left. Most columns were String type. ClickHouse was already compressing them well, but we knew we could do better by being specific about what our data actually looked like.

We went column by column:

  • Columns with a few hundred unique values across 170 million rows? Wrapped them in LowCardinality. Instead of storing the full string per row, ClickHouse builds a dictionary and stores a 1-byte index. The savings are absurd when you have 200 unique values repeated across 170 million rows.

  • Columns with a tiny fixed set of possible values? Replaced with Enum8—one byte per row, with schema-level validation that rejects bad data at insert time.

  • Columns with fixed-length values? Switched to FixedString(N). No length prefix overhead. Uniform arrays that compress better than variable-length data.

  • Structured data stored as strings? Replaced with native types. Fixed-width binary representations that are smaller and denser.

These changes were boring. No algorithmic breakthroughs. Just telling ClickHouse what we already knew about our data.

The Numbers

Same 170 million rows. Same data. Same queries. Same dashboards.

Metric Before After Change
Compressed (disk) 2.71 GiB 166 MiB -93.9%
Uncompressed 16.69 GiB 14.14 GiB -15.2%
Compression ratio 6.16x 88.6x 14x better
Per-row (disk) 16.8 bytes 1.02 bytes -94%

From 6x compression to 89x compression. From 16.8 bytes per row to just over 1 byte per row on disk.

The entire analytics table—170 million click events with full metadata—now fits in 166 MiB. That's smaller than what the UUID column alone used to cost.

One Byte Per Row Changes Everything

At ~1 byte per row compressed, the scaling math gets fun:

Rows Disk
170M (current) 166 MiB
500M ~490 MiB
1 billion ~1 GiB
10 billion ~10 GiB

A billion rows in a gigabyte. Ten billion in ten. We went from worrying about disk provisioning to realizing storage is no longer a meaningful cost in our infrastructure.

It's Faster Too

Less data on disk means less data to read. We ran some queries on the optimized table to see what that looks like in practice:

Query Rows Scanned Time
Full-column scan, 1M rows 1,000,000 47ms
30-day time-series 170,500,000 284ms
Full-table multi-column aggregation 170,500,000 1.76s

A 30-day time-series across all 170 million rows in 284 milliseconds. A full-table multi-column aggregation in under 2 seconds. A million-row scan in 47ms.

Before the optimization, every one of those queries was dragging 2.55 GiB of random UUID data through memory. Now the entire table on disk is smaller than that one column used to be.

Dictionary-encoded columns also get compared using integer operations instead of string matching. Less data to read and cheaper comparisons per row. The wins compound.

The Lesson

We almost didn't look. The table was working. Queries were fast enough. Disk wasn't full yet. We would have happily run at 6x compression for months without questioning it.

The fix took one afternoon:

  1. Drop the column nobody queried. A random UUID in a columnar analytics database is almost pure waste. If you're not filtering, grouping, or selecting by it—it's just dead weight that every scan has to skip over. In our case, it was 94% of our storage cost.

  2. Use specific types instead of String. LowCardinality for columns with few unique values. Enum8 for tiny fixed sets. FixedString for fixed-length data. Native types for structured data. Don't make the compressor guess what you already know.

The uncompressed data only dropped 15%—from 16.69 GiB to 14.14 GiB. The real magic was in compression. By removing incompressible data and giving ClickHouse better type hints, the ratio jumped from 6x to 89x.

Your table probably has a column like our UUID. Something that seemed reasonable, that nobody questions, that's quietly eating your disk budget. Go check system.columns and sort by data_compressed_bytes. You might be surprised what you find.


QCK is a URL shortener with built-in real-time analytics. Try it free — 25 links, full analytics, no credit card.