
The 5 Regex Patterns I Use in 90% of My Projects
Regex is hard. But you only need 5 patterns. After 3 years of building scrapers and data pipelines, I use the same 5 regex patterns in almost every project. 1. Extract Email Addresses import re text = ' Contact us at hello@company.com or support@company.co.uk ' emails = re . findall ( r ' [\w.+-]+@[\w-]+\.[\w.]+ ' , text ) # ['hello@company.com', 'support@company.co.uk'] 2. Extract URLs text = ' Visit https://example.com/page?q=test or http://api.site.io/v2 ' urls = re . findall ( r ' https?://[\w.-]+(?:/[\w./?=&%-]*)? ' , text ) # ['https://example.com/page?q=test', 'http://api.site.io/v2'] 3. Extract Numbers (including decimals and negatives) text = ' Price: $29.99, discount: -5.50, items: 3 ' numbers = re . findall ( r ' -?\d+\.?\d* ' , text ) # ['29.99', '-5.50', '3'] 4. Clean Whitespace (multiple spaces, tabs, newlines → single space) text = ' Too many \n\n spaces here \t\t tabs ' clean = re . sub ( r ' \s+ ' , ' ' , text ). strip () # 'Too many spaces here tabs' 5. Extract Conten
Continue reading on Dev.to Python
Opens in a new tab




