
build-my-own-datalake: Improve metadata with caching
Building a High-Performance Metadata System with Global Caching Caching schema metadata at the JNI boundary to eliminate per-write filesystem reads I've made a goal for project vine , as write-optimized data lake format. And what I didn't expect was that, reading a small JSON file would be the bottleneck. My initial implementation was spending a significant chunk of each write just loading a schema definition from disk—repeatedly, on every operation. For a system handling thousands of writes per second, this compounds fast. The Initial Implementation Like many data lake formats, Vine uses a metadata file to define table schemas: { "table_name" : "user_events" , "fields" : [ { "id" : 1 , "name" : "user_id" , "data_type" : "long" , "is_required" : true }, { "id" : 2 , "name" : "event_type" , "data_type" : "string" , "is_required" : true } ] } Both read and write paths load this file on every call: fn load_metadata ( base_path : & str ) -> Result < Metadata > { let meta_path = Path :: new
Continue reading on Dev.to
Opens in a new tab




