The wine industry is awash with data. At any one time, there are hundreds of thousands of wines across the globe, each associated with features such as varietal, price, year, ratings, reviews, and yield as well as some of the drivers such as climate and soil. It is a rich data set reflecting a complex, fickle, and sensitive product across space and time. However, I see very relatively data science happening, at least in public forums.
Sure, there’s some simple exploratory data analysis such as vintage charts that tally a mysterious and subjective rating over time. The UCI wine quality data set is a common sight in data science courses. And, there are a few people doing some interesting things with visualization (such as this and this) and recommendations engines. However, it is surprisingly little given the size of the industry and the global breath and appeal of wine.
This blog is an attempt to satisfy my own curiosity, develop some predictive models and data-driven insights, and to teach myself a few new machine learning tricks. I love wine. I love data science. So, why not?