
Performance considerations for loading data into BigQuery
Customers have been using BigQuery for their data warehousing needs since it was introduced. Many of these customers routinely load very large data sets into their Enterprise Data Warehouse. Whether one is doing an initial data ingestion with hundreds of TB of data or incrementally loading from systems of record, performance of bulk inserts is key to quicker insights from the data. The most common architecture for batch data loads uses Google Cloud Storage(Object storage) as the staging area for all bulk loads. All the different file formats are converted into an optimized Columnar format called ‘Capacitor’ inside BigQuery. This blog will focus on various file types for best performance. Data files that are uploaded to BigQuery, typically come in Comma Separated Values(CSV), AVRO, PARQUET, JSON, ORC formats. We are going to use two large datasets to compare and contrast each of these file formats. We will explore loading efficiencies of compressed vs. uncompressed data for each of the
Continue reading on Google Cloud Blog
Opens in a new tab




