When you start dealing with large datasets, classic JSON files get annoying, slow to parse, heavy to store, and painful to stream. That’s where JSONL and GZ compression step in. Used together, they give you a lightweight, stream-friendly, storage-friendly format that’s perfect for logs, analytics pipelines, ML training data, backups, and anything high-volume.
Let’s break it down.
What Is JSONL?
JSON Lines (JSONL) is a format where each line is a separate JSON object.
Example:
jsonl{"id": 1, "name": "Alp"}
{"id": 2, "name": "Batuhan"}
{"id": 3, "name": "Cengiz"}
That’s it. No arrays, no giant closing brackets, no memory-eating parsing. Your app can read it line-by-line instead of loading the whole thing.
Why JSONL is great
- Streamable - perfect for real-time processing.
- Append-friendly - logs and events fit naturally.
- Resilient - one broken line doesn’t destroy the whole file.
-
Easy to process with tools like
jq,awk, or any language’s line reader.
What Is GZ?
GZ (gzip) is a fast compression algorithm. Nothing fancy - just a solid way to shrink your data.
Why it works so well with JSONL:
- JSONL is text-heavy.
- Heavy text compresses extremely well.
- Gzip is fast to decompress and can be streamed.
Why JSONL + GZ Together Rocks
Combine them and you get:
1. Massive Size Reduction
JSON compresses insanely well. A 1GB JSONL often becomes ~100–200MB .jsonl.gz.
2. Streamable Even When Compressed
You can read GZ line-by-line without extracting it fully:
-
In Go:
gzip.Reader+ bufio scanner -
In Node:
zlib.createGunzip()+ readline -
In Python:
gzip.open()
This means huge datasets don’t blow up your RAM.
3. Perfect for Logs, Analytics & ML
All major data pipelines love this format because it solves the “big file problem” without weird dependencies.
4. Easy to version, backup, and upload
Most cloud storage (S3, GCS, etc.) expects .jsonl.gz for large data dumps.
How to Create a JSONL + GZ File
Linux CLI
bashgzip data.jsonl
You get: data.jsonl.gz
Node.js
jsconst fs = require("fs");
const zlib = require("zlib");
const gzip = zlib.createGzip();
fs.createReadStream("data.jsonl")
.pipe(gzip)
.pipe(fs.createWriteStream("data.jsonl.gz"));
Go
gowriter, _ := os.Create("data.jsonl.gz")
gz := gzip.NewWriter(writer)
defer gz.Close()
file, _ := os.Open("data.jsonl")
io.Copy(gz, file)
How to Read It (Streaming)
CLI
bashzcat data.jsonl.gz | jq .
Python
pythonimport gzip, json
with gzip.open("data.jsonl.gz", "rt") as f:
for line in f:
print(json.loads(line))
Typical Use Cases
✔ Log aggregation ✔ Event pipelines ✔ ETL jobs ✔ ML training datasets ✔ Backups & syncing across environments ✔ High-volume analytics systems
If your dataset is too big for a classic JSON file but you still want human-readable, line-based records, this combo is basically unbeatable.
Final Thoughts
JSONL + GZ gives you simplicity + performance with very few trade-offs. It’s easy, portable, compresses like crazy, and scales smoothly. If you’re building anything that handles large streams of structured data, this format should be one of your defaults.
Album of the blog:




