Back to articles
Stop Scraping HTML — Use These Hidden JSON APIs Instead

Stop Scraping HTML — Use These Hidden JSON APIs Instead

via Dev.to PythonAlex Spinov

I used to write BeautifulSoup parsers for every website. CSS selectors, XPath, regex on HTML... Then I discovered that most websites have hidden JSON APIs that return clean, structured data. No parsing needed. How to Find Hidden APIs Open Chrome DevTools → Network tab Filter by XHR/Fetch Browse the website normally Watch for JSON responses You'll be amazed how many sites serve their data through internal APIs. Real Examples YouTube (Innertube API) YouTube's website doesn't scrape its own HTML. It uses an internal API called Innertube : import requests resp = requests . post ( " https://www.youtube.com/youtubei/v1/search " , json = { " context " : { " client " : { " clientName " : " WEB " , " clientVersion " : " 2.20240101.00.00 " }}, " query " : " python tutorial " }) data = resp . json () # Clean JSON with all video data — no HTML parsing! No API key. No quotas. Same data YouTube shows you. Toolkit: youtube-innertube-toolkit GitHub Trending GitHub's trending page has no official API.

Continue reading on Dev.to Python

Opens in a new tab

Read Full Article
3 views

Related Articles