
Scraping Government Data: Public Records, APIs, and FOIA Data
Government websites are treasure troves of public data. From business registrations to court filings, this data is public by law, making it one of the most legitimate scraping targets. Scraping Open Data Portals Many portals use CKAN or Socrata APIs: import requests import pandas as pd class GovDataScraper : def search_data_gov ( self , query , rows = 50 ): url = ' https://catalog.data.gov/api/3/action/package_search ' resp = requests . get ( url , params = { ' q ' : query , ' rows ' : rows }) data = resp . json () datasets = [] for r in data [ ' result ' ][ ' results ' ]: resources = [ { ' url ' : res [ ' url ' ], ' format ' : res . get ( ' format ' , ' N/A ' )} for res in r . get ( ' resources ' , []) if res . get ( ' format ' , '' ). upper () in ( ' CSV ' , ' JSON ' , ' XML ' ) ] datasets . append ({ ' title ' : r [ ' title ' ], ' org ' : r . get ( ' organization ' , {}). get ( ' title ' , ' N/A ' ), ' resources ' : resources }) return datasets def download_csv ( self , url ): retur
Continue reading on Dev.to Tutorial
Opens in a new tab




