
Searching Case Law PDFs with RAG — A Legal AI Search System using Gemini + SQLite FTS5
Challenges in Case Law Search Traditional court databases (e.g., courts.go.jp) have search functions limited to string matching, and there is no way to search the text within PDFs. Since PDFs contained in case law collections are not extracted as text data, it is common for relevant cases not to be found even when entering keywords like "民法90条" (Civil Code Article 90). Furthermore, due to the lack of technology to automatically extract "争点" (issues) and "判示事項" (points of law), which are central to a judgment, legal professionals had to manually interpret the details of case law. This system resolves these challenges by converting PDFs into readable text, enabling high-speed search with SQLite's FTS5 (Full-Text Search), and automating case law analysis using Gemini. System Architecture This system consists of a four-stage pipeline. PDF Parsing PDFs are converted to text using PyPDF2. For scanned case law, OCR (Tesseract) is used in conjunction, and an index is built in the SQLite databa
Continue reading on Dev.to Python
Opens in a new tab

