Back to articles
We built an open Neo4j expert dataset — here's what we learned

We built an open Neo4j expert dataset — here's what we learned

via Dev.to Pythongibs-dev

We're building GibsGraph , an open-source tool that lets you query any Neo4j graph in plain English — or build new graphs from unstructured text. To generate good Cypher, the agent needs real Neo4j expertise. Not LLM training data. Actual documentation, patterns, and best practices. So we built a curated expert dataset from scratch. 920 records. 5 categories. Fully bundled as JSONL — no setup needed. Here's what we learned along the way. What's in the dataset We parsed the official Neo4j documentation — the Cypher manual, modeling guides, knowledge base articles — into structured records: Type Count Source Cypher examples 446 Official docs, parsed from AsciiDoc Best practices 318 Knowledge base articles, modeling guides Cypher functions 133 Cypher manual function reference Cypher clauses 36 Cypher manual clause reference Modeling patterns 23 Data modeling docs + curated additions Each record has a source_file tracing back to the original documentation, an authority_level (1 = official

Continue reading on Dev.to Python

Opens in a new tab

Read Full Article
1 views

Related Articles