
The Perfect Docker Setup for Web Scraping (I Spent Months Getting This Right)
I've dockerized 20+ scraping projects. Every time, I hit the same problems: Playwright browsers bloating the image to 2GB+ Chrome crashing with 'out of memory' in containers Different behavior between local and production Slow builds when changing one line of code Here's the Dockerfile I now use for every project. It took months of pain to get right. The Dockerfile # Stage 1: Dependencies (cached layer) FROM python:3.12-slim AS deps WORKDIR /app # System deps for Playwright/Chrome RUN apt-get update && apt-get install -y --no-install-recommends \ libnss3 libatk1.0-0 libatk-bridge2.0-0 libdrm2 \ libxkbcommon0 libxcomposite1 libxdamage1 libxrandr2 \ libgbm1 libasound2 libpango-1.0-0 libcairo2 \ && rm -rf /var/lib/apt/lists/ * COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Install ONLY Chromium (not all browsers) RUN playwright install chromium --with-deps # Stage 2: App FROM deps AS app WORKDIR /app COPY . . # Non-root user (important for security) RUN usera
Continue reading on Dev.to Python
Opens in a new tab




