I Wrote 82 Regex Replacements to Parse 6,933 Time Format Variations From a Government Dataset

Note: This article is also available in Japanese . The Setup Japan's Ministry of Health publishes a list of ~10,000 pharmacies that dispense emergency contraception. I built a search tool for it. The dataset has a hours field. Business hours. How bad could it be? Mon-Fri:9:00-18:00,Sat:9:00-13:00 Split on , , split on : , parse the range. One regex. Done. First version coverage: 89.4%. Over 10% of entries failed to parse. Here's why. The Horror: Free-Text Entry With No Schema There's no format specification. Each pharmacy across 47 prefectures types whatever they want. Here are real entries that all mean "Monday to Friday, 9:00 to 18:00": 月-金:9:00-18:00 ← clean 月～金：９：００～１８：００ ← full-width everything ⽉-⾦:9:00-18:00 ← ...what? 月曜日～金曜日 9時～18時 ← kanji time notation (月火水木金)9:00-18:00 ← parenthesis grouping 平日:9:00-18:00 ← "weekdays" in Japanese 月から金は9時から18時 ← literal prose All the same meaning. My job: funnel all of these into a single canonical form. The function that does this calls .repl

I Wrote 82 Regex Replacements to Parse 6,933 Time Format Variations From a Government Dataset

Related Articles

Task 3: Delivery Man Task

I Wasted Months Memorizing Design Patterns — This One Trick Changed Everything

Top 5 Games to Improve Your Coding Skills

I Got a $40 Parking Fine, So I’m Building an App That Fixes It

Here Is What Programming Taught Me About Solving Real-World Problems

Related Articles

How-To
Task 3: Delivery Man Task
Dev.to • 3h ago

How-To
I Wasted Months Memorizing Design Patterns — This One Trick Changed Everything
Medium Programming • 3h ago

How-To
Top 5 Games to Improve Your Coding Skills
Medium Programming • 4h ago

How-To
I Got a $40 Parking Fine, So I’m Building an App That Fixes It
Medium Programming • 7h ago

How-To
Here Is What Programming Taught Me About Solving Real-World Problems
Medium Programming • 8h ago