
Boosting Lightweight ETL on AWS Lambda & Glue Python Shell with DuckDB and Apache Arrow Dataset
Original Japanese article : AWS Lambda/Glue Python Shell×DuckDBの軽量ETLをApache Arrow Datasetで高速化してみた Introduction I'm Aki, an AWS Community Builder ( @jitepengin ). In my previous articles, I introduced lightweight ETL using AWS Lambda and Glue Python Shell. In the process, I found that DuckDB's performance was not as high as expected: Does Increasing AWS Lambda Memory to 10GB Really Make It Faster? (AWS Lambda chDB/DuckDB PyIceberg Benchmark) AWS Lambda and AWS Glue Python Shell in the Context of Lightweight ETL In this article, I will cover what became the bottleneck for DuckDB and how using Apache Arrow Dataset can improve performance, along with the trade-offs observed. Recap of Previous Articles Does Increasing AWS Lambda Memory to 10GB Really Make It Faster? (AWS Lambda chDB/DuckDB PyIceberg Benchmark) AWS Lambda and AWS Glue Python Shell in the Context of Lightweight ETL Using NYC taxi data, we compared performance on the same file: data.page] https://www.nyc.gov/site/tlc/about/tl
Continue reading on Dev.to
Opens in a new tab


