
I wasted 200+ hours parsing client CSVs. So I built a library that does it in one line.
Every data person has the same nightmare. German client sends a CSV with semicolons, DD.MM.YYYY dates, and European number formatting. French client sends commas with DD/MM/YYYY . US client sends MM/DD/YYYY . You open each one with pandas.read_csv() and it silently corrupts all three. I spent two years writing the same "detect encoding, guess delimiter, figure out date format" code for every new client. Last month I finally snapped and built a library. The problem Here's a typical European export file: Kunden - Nr ; Name ; Datum ; Umsatz ; Aktiv 00742 ; M ü ller GmbH ; 01.03.2025 ; 1.234 , 56 ; Ja 00123 ; Sch ä fer AG ; 15.07.2024 ; 789 , 00 ; Nein 00456 ; B ö hm & Co ; 25.12.2024 ; 12.345 , 67 ; Ja Semicolons as delimiters. Dots in dates. Commas as decimal separators. Leading zeros in IDs. Ja/Nein instead of True/False . Watch what pandas does to it: >>> import pandas as pd >>> df = pd . read_csv ( " german_export.csv " ) >>> print ( df ) Kunden - Nr ; Name ; Datum ; Umsatz ; Aktiv 00
Continue reading on Dev.to
Opens in a new tab




