Applied Data Science @ Columbia

Where shall I start

Jan 29, 2016

I had a conversation with a student who has participated in two hackathons recently. I ask her whick part of the hackathons she found the hardest. Without surprise, she said the hardest part is to figure where to start with the data set. She said she thinks the reason why she didn’t know where to start was because she has not acquired a systematical view of statistics and she “needs to take more courses.”

My response was “oh, no, definitely no more courses. Which course would teach you how to explore a data set?” (except for EDAV).

Data analysis invovles a complicated decision process. Most courses in Statistical curriculum does not involve such challenges. If you are taking a course on “linear regression” with me, and I assign a data analysis problem as homework, you would use linear regression to analyze the data. The chapter titles give good hints, “multiple regression”, “model selection”, etc.

It is this precise reason that I decided to offer this course so that students will be challenged to make complex decisions, with discussion and mutual support from team mates and some inspirational tutorials as starting points. “Where shall I start?” The answer in this course is “Anywhere.” It is a good thing that we are still in a classroom environment. Nothing really bad will happen if you really choose a “wrong” place to start. You will still learn valuable skills and lessons that you will remember. It is also a good thing that you are making such “wrong” decisions now at school in a class. Because, out there, such decisions can be more costly than losing a couple points on a project score.