
Building pandasclean — a pandas data cleaning library from scratch to PyPI
How I Built and Published My First Python Library as a Semester 4 Student Every data project I start looks the same. Load the data, spend 30 minutes hunting for outliers, write the same NaN handling code I wrote last week, watch my notebook eat RAM. Then repeat it all for the next project. I got tired of it. So I built a library. This is the story of how I went from a frustrated CS student to publishing pandasclean on PyPI — and what I learned along the way. The Idea It started simple. I just wanted a function that could detect outliers and let me choose what to do with them. But once I had that, I thought — why not add NaN handling? And memory reduction? And a single function that runs everything? Three weeks later I had a published library. What pandasclean Does pip install pandasclean It has four core functions: 1. find_outliers() — IQR based outlier detection from pandasclean import find_outliers # Just show me the bounds df , bounds = find_outliers ( df , strategy = ' report ' ) #
Continue reading on Dev.to Python
Opens in a new tab



