Finding meaning in text, an experiment in document clustering

Problem For an assignment in the University of British Columbia's CPSC330 course in Applied Machine Learning, we were tasked with categorizing titles pulled from a sample of Food.com recipes . The goal was simple, use a subset of the banks 180,000+ recipes to find categories of recipes purely based off their titles. Achieving said goal was the real challenge, with so many different considerations made in the modeling process due to the nature of the data - text. The data From our sample of recipes, we pulled a smaller subset of data consisting 9100 words. We did this by removing duplicate entries, NaNs , short names (< 5 characters), and only selecting observations with tags that were amongst the top 300 tags in our sample. Below we can see what this unprocessed data of title names looks like, and a visualization of the words within the dataset. Index Recipe Name 42 i yam what i yam muffins 101 to your health muffins 129 250 00 chocolate chip cookies 138 lplermagronen 163 california ro

Finding meaning in text, an experiment in document clustering

Related Articles

SDK v0.2.9: Output Verification, Attestations, Preflight and Budgets

NAS sync with lsyncd and rsync: what was not working and how I fixed it

Installing every* Firefox extension

Why XIRR Breaks When Your Angel Portfolio Hits 10+ Investments

Installing OpenBSD on the Pomera DM250{,XY?}

Related Articles

How-To
SDK v0.2.9: Output Verification, Attestations, Preflight and Budgets
Dev.to • 9h ago

How-To
NAS sync with lsyncd and rsync: what was not working and how I fixed it
Dev.to • 14h ago

How-To
Installing every* Firefox extension
Lobsters • 17h ago

How-To
Why XIRR Breaks When Your Angel Portfolio Hits 10+ Investments
Dev.to • 20h ago

How-To
Installing OpenBSD on the Pomera DM250{,XY?}
Lobsters • 1d ago