
Why I Store All My Scraped Data in SQLite (Not JSON, Not CSV)
For 2 years I saved scraped data as JSON files. One file per run. Sometimes CSV. Then my projects grew, and JSON became a nightmare: 500 JSON files in a directory No way to query across runs Duplicate detection? Manual diffing Data grew to 2GB+ and grep took minutes I switched everything to SQLite. Here's why — and the exact pattern I use. Why SQLite? It's a single file. Your entire database is data.db . Copy it, back it up, email it. It's built into Python. import sqlite3 — no install, no server, no Docker. SQL queries. Need prices from last week? WHERE scraped_at > '2026-03-19' . Try that with 500 JSON files. It handles millions of rows. SQLite comfortably handles 10M+ rows on a laptop. It's fast. Inserts: 100K rows/second. Queries: milliseconds for most workloads. The Pattern I Use Everywhere import sqlite3 import json from datetime import datetime class ScrapingDB : def __init__ ( self , db_path = ' data.db ' ): self . conn = sqlite3 . connect ( db_path ) self . conn . row_factory
Continue reading on Dev.to Python
Opens in a new tab




