Back to articles
Playwright for Web Scraping: When You Need More Than Cheerio

Playwright for Web Scraping: When You Need More Than Cheerio

via Dev.to WebdevАлексей Спинов

Cheerio handles static HTML. But when you need JavaScript rendering, login forms, or infinite scroll — you need Playwright. Install npm install playwright Basic Scraping const { chromium } = require ( ' playwright ' ); async function scrape ( url ) { const browser = await chromium . launch ({ headless : true }); const page = await browser . newPage (); await page . goto ( url , { waitUntil : ' networkidle ' }); const data = await page . evaluate (() => { return Array . from ( document . querySelectorAll ( ' .item ' )). map ( el => ({ title : el . querySelector ( ' h2 ' )?. textContent ?. trim (), price : el . querySelector ( ' .price ' )?. textContent ?. trim () })); }); await browser . close (); return data ; } Handle Infinite Scroll async function scrollAndScrape ( page , maxScrolls = 10 ) { let previousHeight = 0 ; for ( let i = 0 ; i < maxScrolls ; i ++ ) { await page . evaluate (() => window . scrollTo ( 0 , document . body . scrollHeight )); await page . waitForTimeout ( 2000 );

Continue reading on Dev.to Webdev

Opens in a new tab

Read Full Article
7 views

Related Articles