Back to articles
How to Extract Emails and Contacts from Any Website (Node.js)

How to Extract Emails and Contacts from Any Website (Node.js)

via Dev.to TutorialАлексей Спинов

Contact data extraction is one of the most requested scraping tasks. Here's a reliable approach. The Regex Pattern const EMAIL_REGEX = / [ a-zA-Z0-9._%+- ] +@ [ a-zA-Z0-9.- ] + \.[ a-zA-Z ]{2,} /g ; const PHONE_REGEX = / \+[ 1-9 ]\d{0,2}[\s\- . ]\(?\d{2,4}\)?[\s\- . ]?\d{3,4}[\s\- . ]?\d{3,4} | \(\d{3}\)[\s\- . ]?\d{3}[\s\- . ]?\d{4} | \b\d{3}[\- . ]\d{3}[\- . ]\d{4}\b /g ; Full Extractor const cheerio = require ( ' cheerio ' ); async function extractContacts ( url ) { const res = await fetch ( url , { headers : { ' User-Agent ' : ' ContactBot/1.0 ' } }); const html = await res . text (); const $ = cheerio . load ( html ); // Remove scripts/styles $ ( ' script, style ' ). remove (); const text = $ ( ' body ' ). text (); const emails = [... new Set ( text . match ( EMAIL_REGEX ) || [])]; const phones = [... new Set ( text . match ( PHONE_REGEX ) || [])]; // Also check mailto: links $ ( ' a[href^="mailto:"] ' ). each (( i , el ) => { const email = $ ( el ). attr ( ' href ' ). replace ( '

Continue reading on Dev.to Tutorial

Opens in a new tab

Read Full Article
3 views

Related Articles